dsandler.org

Archive for September 8th, 2007

A couple of years ago I worked on a TrackBack Validator which identified and rejected TrackBacks posted on your blog from sites that didn’t actually link to your blog.

Trackback spam. (Figure taken from TR-06-876.)

In our 2006 tech report on the subject (co-authored by my advisor and a number of undergrads in his computer security class), we speculated that—given sufficiently widespread use of inbound-link validation—spammers would be forced to either (a) close up shop, moving on to some other exploitable technology, or (b) start actually linking to their victims. To wit:

Spammers who wish to overcome our mechanism are forced to indefinitely maintain reciprocal links from their own web sites, effectively increasing their necessary investment of time and resources. Furthermore, the spammer’s site, by linking to its victims, will actually benefit the victims’ search engine rankings by sharing part of the spammer’s ranking with each of its victims. Best of all, if the spammer is effectively publishing a list of its victims, that list would provide compelling evidence that could be used against the spammer in legal proceedings.

In the limit, we are effectively pushing spammers to run “legitimate” weblogs. If spammers’ weblogs are following the TrackBack protocol correctly and are legitimately providing reciprocal links, then we face a more fundamental question: is such a TrackBack message actually spam? If a “real” blog is linking to the victim, regardless of any spam-like content it might contain, then the TrackBack the victim receives could well be defined as “legitimate.” At that point, the issue is not one of spam vs. non-spam, but rather one of relevance.

Well, we were right and not right. I just received some TrackBack spam (probably not coincidentally, on a blog post about trackback spam) that fooled the Validator and yet can’t really be considered to be legitimate.

A tricky TrackBack.

The inbound link is included, but hidden from the user with CSS tricks! Here’s an excerpt of the source of the page:

  <style type="text/css" media="screen">
    .trackback { position:absolute; top:0px; left:0px; visibility:hidden; }
  </style>
  <div>
    <div class="trackback">
    [...]
  	<p>
  	  [...] far out site now comment this synopsis
  	  <a href='http://dsandler.org/wp/archives/2005/11/14/trackback-spammers-upping-the-ante'>http://dsandler.org/wp/archives/2005/11/14/trackback-spammers-upping-the-ante</a>
  	  and give comments [...]
  	</p>

As you can see, all the inbound links are surrounded with irrelevant content, but what’s more, they’re children of the <div class="trackback"> and hence invisible to readers. In our paper we point to readers as one of two “last resorts” to help weed out irrelevant but otherwise Validated TrackBacks; obviously they won’t be able to help here. (The other technique, which would still work in this case, is the same sort of statistical classification currently used for email; see §5 of the TR for details.)

In the end, this “break” of the Validator may not yield much for this spammer aside from the satisfaction of successfully defacing my blog. Google has been known to apply a PageRank penalty to websites with large regions of hidden text, so the currency gained by inbound links may very well be more than offset. What’s more, like most modern blogs and CMSes, dsandler.org applies rel="nofollow" to any links found in comments or TrackBacks, so the spammer gets zero Google-juice in this situation.

But since spam is so cheap, the spammer probably doesn’t care. That’s why the Validator was so important: it proved remarkably effective at reducing the “collateral damage” of spam, namely, blog defacement. In order to continue to be effective against this sort of attack, it would probably need to include some sort of CSS/DOM interpreter.

(Yuck.)

For more on all these icky edge-cases in TrackBack (and other forms of Web) spam, read the report. (It’s just a six-pager.)

subscribe to dsandler.org

  •  
  • for faster updates, subscribe with FeedTree

mac software made on premises

toastycode.com: toasty software for the mac pyrotheque: a new (old) fireworks screensaver for the mac
Cuckoo—the bell tolls for your Mac.

twitter/dsandler [RSS]

    loading…

elsewhere

highlights

between the couch cushions

strongly connected

  • erinmak is not to be trifled with
  • pixelknave says moof when upside-down
  • dave is dangerous
  • rod is one groovy mother
  • adam is googling us all
  • amar is not really a pirate
  • angi sees little blue dots
  • harbinger lets you know it's coming
  • jason looks like an idiot in that hat
  • jeff is keeping austin weird
  • regan seems to tolerate jason
  • emann will not abide your IM-speak
  • jim is a stranger in ein anderes Land
  • liscio is pronounced "lee-show"
  • darryl has no need of identifying objects
  • friends as they appear on dsandler.org
  • sportsgirl reports…on all the pro courts

Search

Recent

Archives

dsandler.org is Dan Sandler's website and notebook.

Powered by WordPress and here's why.