dsandler.org

Slashdot is the grand-daddy of early adopter websites, and as such has the most-subscribed-to RSS feed in the known universe. (If you look at individual metrics such as the top Bloglines feeds, Slashdot wins handily with over 20,000 subscribers.) Slashdot may have been the first website to actually implement RSS throttling (due to being the first website to need it); the policy states that clients are blocked not by instantaneous polling frequency, but with a sliding hourlong window.

I recently wrote about RSS throttling techniques, and expressed my concern at the scalability of tracking RSS hogs on the server. Slashdot’s Jamie McCarthy has just written a thoughtful response, including details of Slashdot’s implementation.

I’ll grant that our accesslog traffic is pretty I/O intensive. But if you were only talking about logging RSS hits and nothing else, it’d be a piece of cake.

[…]

How many RSS hits can you get in an hour? A hundred thousand? That’s peanuts, especially since each row is fixed size.

Jamie goes on to explain that the access logs are only periodically (every two minutes) sifted into a separate RSS-tracking table, which is then combed for faulty clients. These clients are then blacklisted in a third table (actually a file perused by a PerlAccessHandler) to enforce the block.

The point, though, is that these computations are a drop in the bucket because of all the other information Slashdot is collecting about each hit:

Slashdot’s resource requirements are actually a lot higher than this, since we log every hit instead of just RSS, we log the query string, user-agent, and so on — and also because we’ve voluntarily taken on the privacy burden of MD5′ing incoming IP addresses so we don’t know where users are coming from. That makes our IP address field 28 bytes longer than it has to be. But even so, we don’t have performance issues. Slashdot’s secondary table processing takes about 10-15 seconds every 2 minutes.

So if you’re Slashdot, and you’re already spinning your database hard for each HTTP fetch, performing RSS blocking is no big deal. Jamie agrees that it still has issues with IP address uniqueness (and points to this subthread of the recent RSS thread on /. for disgruntled users), but it definitely appears to be a workable stopgap solution for websites with big iron and fast databases.

One Response to “Real-world RSS throttling on Slashdot.”

  1. Regular Sucking Schedule says:

    Slashdot’s Jamie on RSS Throttling
    Jamie McCarthy writes very specifically about how to implement Slashdot’s approach to throttling aggressive little buggers trying to request-bomb your site, intentionally or not. I’m not sure that on lower-trafficked sites this strategy works. The p…


subscribe to dsandler.org

  •  
  • for faster updates, subscribe with FeedTree

mac software made on premises

toastycode.com: toasty software for the mac pyrotheque: a new (old) fireworks screensaver for the mac
Cuckoo—the bell tolls for your Mac.

twitter/dsandler [RSS]

    loading…

elsewhere

highlights

between the couch cushions

strongly connected

  • erinmak is not to be trifled with
  • pixelknave says moof when upside-down
  • dave is dangerous
  • rod is one groovy mother
  • adam is googling us all
  • amar is not really a pirate
  • angi sees little blue dots
  • harbinger lets you know it's coming
  • jason looks like an idiot in that hat
  • jeff is keeping austin weird
  • regan seems to tolerate jason
  • emann will not abide your IM-speak
  • jim is a stranger in ein anderes Land
  • liscio is pronounced "lee-show"
  • darryl has no need of identifying objects
  • friends as they appear on dsandler.org
  • sportsgirl reports…on all the pro courts

Search

Recent

Archives

dsandler.org is Dan Sandler's website and notebook.

Powered by WordPress and here's why.