Regular Sucking Schedule.
OK, RSS, I love you. But it has to be said: you suck.
Regularly. By design.
Not that there aren’t already enough acronym expansions for RSS out there, but you could call it “Regular Sucking Schedule”. If you’re Glenn Fleishman, you’re the first to do so, and you just set up a new blog to track solutions to the RSS scalability situation.
Despite what appears to be a scathing critique of Glenn’s server-side once-only throttling mechanism, I actually like the spirit of it: It is a serious attempt to force constraints on the system on the server end, rather than a polite request for clients to be better citizens.
Dennis Geels suggested this morning that the per-client state in Glenn’s technique could be stored on the clients themselves, in the form of a cookie. Of course, I don’t know if RSS readers store & send cookies or not, and (he quickly pointed out) it would be trivially thwarted (but we’re not even at the level of malicious users at this point, just well-meaning but overzealous clients). More clever still was his other idea: Just keep per-client state for a sliding window of the last hour or so. Slower clients aren’t a problem, so forget them. Still not stateless, but bounded.
At the IRIS student workshop, Mike Freedman (NYU; of the Coral p2p CDN) suggested to me that servers might limit the RSS feed items to those which were created (or modified!) after the date specified in the ETag or If-Modified-Since sent by the RSS reader during each fetch.
Despite my desire to avoid polling, I do like this idea. It’s a server-side solution which will work for multiple hosts behind a firewall, reduces bandwidth usage for every fetch (after the first), and exploits existing behavior of RSS reader HTTP implementations. It doesn’t require persistent per-client state on the server, either. [Update: dlg points out that this approach sounds a lot like HTTP delta encoding
as applied to feeds.]
The only problems I saw with this approach are: (1) it relies on correct ETag support in the client (as I said, most clients do this now in order to support abbreviated 304 Not Modified responses); (2) it requires a little bit of extra processing on the server (presumably nothing unreasonable, since the server is already reading some recent subset of database entries, so why not make the WHERE clause a date comparison?); (3) it’s still polling. I know, I said I was over this, but—as you know by now—I think we can do better.
(In case you’re just tuning in, I’m working on a peer-to-peer RSS distribution system called FeedTree. The initial deployment will take the form of a proxy which you can use with your existing RSS software, but which will grab news items from the p2p overlay much faster than even the most aggressive polling schedule. For feeds which aren’t already being simulcast on FeedTree, the proxy will fall back to polling, but share the polled items with the overlay. You can find a little more information in my presentation slides, which are available from the IRIS-SW 04 program page. Look for a project page and some sample code soon.)