dsandler.org

Since last month, when I wrote about Glenn Fleishman’s RSS figures, Glenn has come up with a rudimentary solution:

I built a simple program running via Apache that throttles RSS downloads: a given IP and user agent combination can only request a given RSS feed file if it’s changed since they last retrieved it.

He argues that this solution worked for him. It’s hard to tell from his daily RSS bar-chart whether the solution (deployed on 11/20) is responsible for the decrease in RSS bandwidth (which seems to start a couple of days before that), but what concerns me most about this solution is that it requires per-client state to be maintained on the server.

I’m uncomfortable with this solution because it’s hard to make it scale. First, you have to hit a database (of some kind) to cross-reference the client IP address with its last fetch time. Maybe that’s not a big deal; after all, you’re hitting the database to read your website data too. But then you have to write to the database in order to record the new fetch time (if the RSS feed has changed), and database writes are slow.

The other obvious problem with this approach is that IP address (well, IP address combined with User-Agent) is no good for hosts hidden by address translation. Actually, I don’t think the User-Agent string helps much at all, since hosts behind NATs (home computers, office computers) tend to have homogenous software environments and are therefore likely to have identical UAs. So that means only one user behind a NAT is able to get each new RSS item. (Not one at a time: one, period. Until the RSS changes again, and then it’s a race to see who can snag the file first.) [Update: Dann Sheridan experiences this exact problem.]

2 Responses to “The “I’m only saying this once” algorithm for RSS throttling.”

  1. Regular Sucking Schedule says:

    Dan Sandler Comments on My RSS Throttling Technique
    Dan points out very adeptly that my idea of trying to throttle aggregation file requests on a per-client basis is riddled with holes. So perhaps my bandwidth gains (losses, as it were) are because I’m feeding out fewer unique requests to users behind…

  2. Jamie's Journal on Slashdot says:

    Efficient RSS Throttling
    Dan Sandler has an article about RSS throttling… This is exactly what we do on Slashdot… Every hit, whether to a dynamically-generated perl script page, or to a static .shtml or .rss page, triggers an Apache PerlCleanupHandler…


subscribe to dsandler.org

  •  
  • for faster updates, subscribe with FeedTree

mac software made on premises

toastycode.com: toasty software for the mac pyrotheque: a new (old) fireworks screensaver for the mac
Cuckoo—the bell tolls for your Mac.

twitter/dsandler [RSS]

    loading…

elsewhere

highlights

between the couch cushions

strongly connected

  • erinmak is not to be trifled with
  • pixelknave says moof when upside-down
  • dave is dangerous
  • rod is one groovy mother
  • adam is googling us all
  • amar is not really a pirate
  • angi sees little blue dots
  • harbinger lets you know it's coming
  • jason looks like an idiot in that hat
  • jeff is keeping austin weird
  • regan seems to tolerate jason
  • emann will not abide your IM-speak
  • jim is a stranger in ein anderes Land
  • liscio is pronounced "lee-show"
  • darryl has no need of identifying objects
  • friends as they appear on dsandler.org
  • sportsgirl reports…on all the pro courts

Search

Recent

Archives

dsandler.org is Dan Sandler's website and notebook.

Powered by WordPress and here's why.