I built a simple program running via Apache that throttles RSS downloads: a given IP and user agent combination can only request a given RSS feed file if it’s changed since they last retrieved it.
He argues that this solution worked for him. It’s hard to tell from his daily RSS bar-chart whether the solution (deployed on 11/20) is responsible for the decrease in RSS bandwidth (which seems to start a couple of days before that), but what concerns me most about this solution is that it requires per-client state to be maintained on the server.
I’m uncomfortable with this solution because it’s hard to make it scale. First, you have to hit a database (of some kind) to cross-reference the client IP address with its last fetch time. Maybe that’s not a big deal; after all, you’re hitting the database to read your website data too. But then you have to write to the database in order to record the new fetch time (if the RSS feed has changed), and database writes are slow.
The other obvious problem with this approach is that IP address (well, IP address combined with User-Agent) is no good for hosts hidden by address translation. Actually, I don’t think the User-Agent string helps much at all, since hosts behind NATs (home computers, office computers) tend to have homogenous software environments and are therefore likely to have identical UAs. So that means only one user behind a NAT is able to get each new RSS item. (Not one at a time: one, period. Until the RSS changes again, and then it’s a race to see who can snag the file first.) [Update: Dann Sheridan experiences this exact problem.]