waving android

I am currently a software engineer at Google, where as a member of the Android platform team I build frameworks and user interfaces.

The blog here at is mostly historical; you can find more recent posts on .

The “I’m only saying this once” algorithm for RSS throttling.

December 11th, 2004

Since last month, when I wrote about Glenn Fleishman’s RSS figures, Glenn has come up with a rudimentary solution:

I built a simple program running via Apache that throttles RSS downloads: a given IP and user agent combination can only request a given RSS feed file if it’s changed since they last retrieved it.

He argues that this solution worked for him. It’s hard to tell from his daily RSS bar-chart whether the solution (deployed on 11/20) is responsible for the decrease in RSS bandwidth (which seems to start a couple of days before that), but what concerns me most about this solution is that it requires per-client state to be maintained on the server.

I’m uncomfortable with this solution because it’s hard to make it scale. First, you have to hit a database (of some kind) to cross-reference the client IP address with its last fetch time. Maybe that’s not a big deal; after all, you’re hitting the database to read your website data too. But then you have to write to the database in order to record the new fetch time (if the RSS feed has changed), and database writes are slow.

The other obvious problem with this approach is that IP address (well, IP address combined with User-Agent) is no good for hosts hidden by address translation. Actually, I don’t think the User-Agent string helps much at all, since hosts behind NATs (home computers, office computers) tend to have homogenous software environments and are therefore likely to have identical UAs. So that means only one user behind a NAT is able to get each new RSS item. (Not one at a time: one, period. Until the RSS changes again, and then it’s a race to see who can snag the file first.) [Update: Dann Sheridan experiences this exact problem.]

2 responses

  1. Regular Sucking Schedule  

    Dan Sandler Comments on My RSS Throttling Technique
    Dan points out very adeptly that my idea of trying to throttle aggregation file requests on a per-client basis is riddled with holes. So perhaps my bandwidth gains (losses, as it were) are because I’m feeding out fewer unique requests to users behind…

    comment posted at 2:15 pm on 11 Dec 2004

  2. Jamie's Journal on Slashdot  

    Efficient RSS Throttling
    Dan Sandler has an article about RSS throttling… This is exactly what we do on Slashdot… Every hit, whether to a dynamically-generated perl script page, or to a static .shtml or .rss page, triggers an Apache PerlCleanupHandler…

    comment posted at 11:09 am on 15 Dec 2004

newer: older: