dsandler.org

Tag: spam

A couple of years ago I worked on a TrackBack Validator which identified and rejected TrackBacks posted on your blog from sites that didn’t actually link to your blog.

Trackback spam. (Figure taken from TR-06-876.)

In our 2006 tech report on the subject (co-authored by my advisor and a number of undergrads in his computer security class), we speculated that—given sufficiently widespread use of inbound-link validation—spammers would be forced to either (a) close up shop, moving on to some other exploitable technology, or (b) start actually linking to their victims. To wit:

Spammers who wish to overcome our mechanism are forced to indefinitely maintain reciprocal links from their own web sites, effectively increasing their necessary investment of time and resources. Furthermore, the spammer’s site, by linking to its victims, will actually benefit the victims’ search engine rankings by sharing part of the spammer’s ranking with each of its victims. Best of all, if the spammer is effectively publishing a list of its victims, that list would provide compelling evidence that could be used against the spammer in legal proceedings.

In the limit, we are effectively pushing spammers to run “legitimate” weblogs. If spammers’ weblogs are following the TrackBack protocol correctly and are legitimately providing reciprocal links, then we face a more fundamental question: is such a TrackBack message actually spam? If a “real” blog is linking to the victim, regardless of any spam-like content it might contain, then the TrackBack the victim receives could well be defined as “legitimate.” At that point, the issue is not one of spam vs. non-spam, but rather one of relevance.

Well, we were right and not right. I just received some TrackBack spam (probably not coincidentally, on a blog post about trackback spam) that fooled the Validator and yet can’t really be considered to be legitimate.

A tricky TrackBack.

The inbound link is included, but hidden from the user with CSS tricks! Here’s an excerpt of the source of the page:

  <style type="text/css" media="screen">
    .trackback { position:absolute; top:0px; left:0px; visibility:hidden; }
  </style>
  <div>
    <div class="trackback">
    [...]
  	<p>
  	  [...] far out site now comment this synopsis
  	  <a href='http://dsandler.org/wp/archives/2005/11/14/trackback-spammers-upping-the-ante'>http://dsandler.org/wp/archives/2005/11/14/trackback-spammers-upping-the-ante</a>
  	  and give comments [...]
  	</p>

As you can see, all the inbound links are surrounded with irrelevant content, but what’s more, they’re children of the <div class="trackback"> and hence invisible to readers. In our paper we point to readers as one of two “last resorts” to help weed out irrelevant but otherwise Validated TrackBacks; obviously they won’t be able to help here. (The other technique, which would still work in this case, is the same sort of statistical classification currently used for email; see §5 of the TR for details.)

In the end, this “break” of the Validator may not yield much for this spammer aside from the satisfaction of successfully defacing my blog. Google has been known to apply a PageRank penalty to websites with large regions of hidden text, so the currency gained by inbound links may very well be more than offset. What’s more, like most modern blogs and CMSes, dsandler.org applies rel="nofollow" to any links found in comments or TrackBacks, so the spammer gets zero Google-juice in this situation.

But since spam is so cheap, the spammer probably doesn’t care. That’s why the Validator was so important: it proved remarkably effective at reducing the “collateral damage” of spam, namely, blog defacement. In order to continue to be effective against this sort of attack, it would probably need to include some sort of CSS/DOM interpreter.

(Yuck.)

For more on all these icky edge-cases in TrackBack (and other forms of Web) spam, read the report. (It’s just a six-pager.)

You were wrong: stevenf’s Spamusement is back!

Dear blog owners, I’ve got some new software you should try:

  1. Updated: The powerful TrackBack Validator plugin for WordPress has been revved to version 0.7. This plugin kills almost all existing TrackBack, dead. [I say “almost” because I (coincidentally) received word today of a spammer who sets up real blogs to try to spam people. I think of this as a victory for our plugin: It has forced spammers to, you know, behave like real bloggers. Who’s to say a TrackBack from this guy is spam and not a legitimate link to your blog?]
  2. New: Illuminati is a new Internet measurement project from the CoralCDN (”Coral cache”) guys. They’re gathering much-needed statistics (like these awesome graphs) about the edges of the Internet (that’s you!), including NAT and proxy statistics, and they need help from website operators everywhere. Visit the site to learn how to contribute, or (if you’re a WordPress blogger) install my Illuminati plugin for WordPress.

There’s a Chinese spammer out there Googling for Trac projects he can submit junk bugs to. The bugs contain page after page of advertising text for a website selling commercial-grade LED displays. The following is from my access_log, showing that this guy manually searches around for Trac ticket submission forms, then fires away.

61.48.126.237 - - [10/May/2006:10:00:24 -0500] "GET /project/report/6 HTTP/1.1" 200 107783 "http://www.google.cn/search?q=NEW+TICKET+Trac&hl=zh-CN&lr=&newwindow=1&start=990&sa=N&filter=0" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

61.48.126.237 - - [10/May/2006:10:00:44 -0500] "GET /project/newticket HTTP/1.1" 200 15998 "http://trac.feedtree.net/project/report/6" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

61.48.126.237 - - [10/May/2006:10:04:12 -0500] "POST /project HTTP/1.1" 302 14 "http://trac.feedtree.net/project/newticket" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

The spam text, which is entered in every possible field in the bug form, is Chinese (”LED显示屏”, and on and on); it translates to the usual list of key phrases: “color led display, double base color led display, outdoor led display, indoor led display, …” I’ve gotten 4 or 5 of these junk tickets now. Congratulations, 61.48.126.237, you’re the proud recipient of a new deny from rule!

Someone out there has developed a crawler that attacks Trac wiki pages. Once it’s found a Trac installation, it posts an update to the WikiStart and TracIni pages. The new version appends a number of links, hidden from view using Trac’s syntax to allow arbitrary HTML:

{{{
#!html
<u style="display:none">
...nasty links...
</u>
}}}

I’ve been hit over at the FeedTree trac a few times; it’s infrequent enough that periodic checking of the timeline view is sufficient to spot and clean out the crud.

(I guess you know your software has “made it” when someone else writes a piece of software specifically to attack it.)

A new technique in foiling content-based spam filters: using CSS rendering to construct text that the filter can’t see.

V<span style=”float: right”> b </span>I<span style=”float: right”> d </span>A<span=20 style=”float: right”> z …

The “chaff” characters (b, d, z, …) float to the right, while the letters “VIA” in the above example (followed by “GRA” in the source material) settle to the left, lining up in order. Your spam filter’s tokenizer sees nothing. (Previous version of this hack relying on HTML rendering to construct text that the filter can’t see. Example: V<!–foo–>I<!–foo–>A<!–foo–>G<!–foo–>R<!–foo–>A.)

What happens when you come across a wiki that you can’t edit, but that someone else has already filled with loads and loads of spam links? (I guess you just leave it, but that seems kind of like walking past a soda can on the ground right next to the recycling bin.)

Update: Looks like the “edit” link was there, but obscured by the spam; a trip to the HTML revealed the correct URL for the edit form. Recycled!

subscribe to dsandler.org

  •  
  • for faster updates, subscribe with FeedTree

mac software made on premises

toastycode.com: toasty software for the mac pyrotheque: a new (old) fireworks screensaver for the mac
Cuckoo—the bell tolls for your Mac.

twitter/dsandler [RSS]

    loading…

elsewhere

highlights

between the couch cushions

strongly connected

  • erinmak is not to be trifled with
  • pixelknave says moof when upside-down
  • dave is dangerous
  • rod is one groovy mother
  • adam is googling us all
  • amar is not really a pirate
  • angi sees little blue dots
  • harbinger lets you know it's coming
  • jason looks like an idiot in that hat
  • jeff is keeping austin weird
  • regan seems to tolerate jason
  • emann will not abide your IM-speak
  • jim is a stranger in ein anderes Land
  • liscio is pronounced "lee-show"
  • darryl has no need of identifying objects
  • friends as they appear on dsandler.org
  • sportsgirl reports…on all the pro courts

Search

Recent

Archives

dsandler.org is Dan Sandler's website and notebook.

Powered by WordPress and here's why.