waving android

I am currently a software engineer at Google, where as a member of the Android platform team I build frameworks and user interfaces.

The blog here at is mostly historical; you can find more recent posts on .

Validator foiled!

September 8th, 2007

A couple of years ago I worked on a TrackBack Validator which identified and rejected TrackBacks posted on your blog from sites that didn’t actually link to your blog.

Trackback spam. (Figure taken from TR-06-876.)

In our 2006 tech report on the subject (co-authored by my advisor and a number of undergrads in his computer security class), we speculated that—given sufficiently widespread use of inbound-link validation—spammers would be forced to either (a) close up shop, moving on to some other exploitable technology, or (b) start actually linking to their victims. To wit:

Spammers who wish to overcome our mechanism are forced to indefinitely maintain reciprocal links from their own web sites, effectively increasing their necessary investment of time and resources. Furthermore, the spammer’s site, by linking to its victims, will actually benefit the victims’ search engine rankings by sharing part of the spammer’s ranking with each of its victims. Best of all, if the spammer is effectively publishing a list of its victims, that list would provide compelling evidence that could be used against the spammer in legal proceedings.

In the limit, we are effectively pushing spammers to run “legitimate” weblogs. If spammers’ weblogs are following the TrackBack protocol correctly and are legitimately providing reciprocal links, then we face a more fundamental question: is such a TrackBack message actually spam? If a “real” blog is linking to the victim, regardless of any spam-like content it might contain, then the TrackBack the victim receives could well be defined as “legitimate.” At that point, the issue is not one of spam vs. non-spam, but rather one of relevance.

Well, we were right and not right. I just received some TrackBack spam (probably not coincidentally, on a blog post about trackback spam) that fooled the Validator and yet can’t really be considered to be legitimate.

A tricky TrackBack.

The inbound link is included, but hidden from the user with CSS tricks! Here’s an excerpt of the source of the page:

  <style type="text/css" media="screen">
    .trackback { position:absolute; top:0px; left:0px; visibility:hidden; }
  </style>
  <div>
    <div class="trackback">
    [...]
  	<p>
  	  [...] far out site now comment this synopsis
  	  <a href='http://dsandler.org/wp/archives/2005/11/14/trackback-spammers-upping-the-ante'>http://dsandler.org/wp/archives/2005/11/14/trackback-spammers-upping-the-ante</a>
  	  and give comments [...]
  	</p>

As you can see, all the inbound links are surrounded with irrelevant content, but what’s more, they’re children of the <div class="trackback"> and hence invisible to readers. In our paper we point to readers as one of two “last resorts” to help weed out irrelevant but otherwise Validated TrackBacks; obviously they won’t be able to help here. (The other technique, which would still work in this case, is the same sort of statistical classification currently used for email; see §5 of the TR for details.)

In the end, this “break” of the Validator may not yield much for this spammer aside from the satisfaction of successfully defacing my blog. Google has been known to apply a PageRank penalty to websites with large regions of hidden text, so the currency gained by inbound links may very well be more than offset. What’s more, like most modern blogs and CMSes, dsandler.org applies rel="nofollow" to any links found in comments or TrackBacks, so the spammer gets zero Google-juice in this situation.

But since spam is so cheap, the spammer probably doesn’t care. That’s why the Validator was so important: it proved remarkably effective at reducing the “collateral damage” of spam, namely, blog defacement. In order to continue to be effective against this sort of attack, it would probably need to include some sort of CSS/DOM interpreter.

(Yuck.)

For more on all these icky edge-cases in TrackBack (and other forms of Web) spam, read the report. (It’s just a six-pager.)

2 responses

  1. jack  

    great article about spam. Didn;t know that it was possibel to use CSS also.
    Actually, I think spam can to a large degree be blamed on how the SE’s rank sites. If the ranking was made in a different way then the spam would also go away.

    comment posted at 6:19 am on 13 Sep 2007

  2. Spam y Trackbacks en Buayacorp - Diseño y Programación  

    [...] En base a un archivo modificado de wp-trackback.php que me envió Maty, hice unos cambios a éste para que haga casi lo mismo que el plugin Trackback Validator, que básicamente verifica que el sitio que envía la petición contenga un enlace recíproco a la entrada a la que se hace referencia (ver el paper para mayores detalles). La limitación de este método, tal y como reconoce una de las personas que participó en ese proyecto, es que puede evadirse fácilmente de diferentes modos (con CSS, comentarios HTML, JavaScript, generación dinámica de contenidos, etc). [...]

    comment posted at 6:57 pm on 27 Sep 2007

Add a comment

html help (show)

newer: older: