How did this happen? How did I, unrepentant DIY code junkie that I am, abandon my own code and pick up an off-the-shelf product? Worse still: How did I end up with another PHP web application, after swearing an oath of fidelity to Python?
To understand this, you have to go back to 2001, when I converted my stardot homepage into one of those newfangled “weblog” things. The funny thing about weblogs, of course, is that they’ve existed for eons, except they were just called “news pages” or “.plan files”. (Yeah, that’s right, that’s how we did it back in the nineties. Were you even in college in the nineties, gentle reader?)
The difference, though, when you create a weblog, is that you’re making a promise to update it now and again. It could be said that what defines a blog is not the format, but the frequency, and what most affects the frequency is the tool: the advent of accessible publishing frontends (including the pioneer, Blogger, and to a lesser extent UserLand Manila née Frontier). With a simple UI for prepending your webpage (as opposed to editing the whole thing, wiki-style), the tempo of a “news page” can really pick up, turning it into a blog.
So, it’s July of 2001, and the decline of my employer, Be Inc., has just entered the really steep part of its decline. It occurs to me that when things go sour at work, it makes for great reading, so I take my first steps into baldest vanity and decide to go blog. (You never, it turns out, go back.)
I built a Bourne shell script, diary-new, that would open my $EDITOR and save the result to a new file in an entries directory. I hacked together a quick index.php which would scan this directory, sort the files by their modification date, and spew their contents in reverse-chronological order. I shortly thereafter built diary-edit, which would allow me to tweak a file without also changing its modification date, and diary-mail-filter, a script to be called from Procmail so that I could blog-by-email.
Pretty much all of this was in place by the fall of 2001. [Tangent: Of course, it turns out that—surprise, surprise—you can't say anything really interesting in public about your company while you still work there. (It appears that not everyone has learned this lesson, years later.) So the early entries (from the first couple of months) are pretty vague sorts of half-stories leaving out all the interesting parts. Of course, it's not like we were all suddenly free to reveal our dirty corporate secrets once we started at the new place; Palm, Inc. was more regimented than Be in just about every way, in part because people actually cared what happened to Palm. Loose lips might really sink something over there! Although we did succeed in loosening the place up a bit, in the end. But I digress.]
As is the case with any temporary kluge, the shell and PHP scripts stuck. When I moved to a vanity domain from stardot, I took the opportunity to build a few new tools for the site. I had recently discovered a deep and lusty affection for Python, so I rebuilt the blog-by-mail engine using the rfc822 module, and added a search feature. More and more scripts cropped up, too (an RSS 1.0 feed; some automatic image-scaling code for posts with photos attached), and dsandler.org grew organically into the green (then white) site I came to know and love.
And hate. Boy, did I hate it. After a few hundred posts, the scan-a-whole-directory thing hit its first major hiccup. The Linux host didn’t like directories with so many files, so I created a new filing technique in which entries were stored in a subdirectory named for the first byte of the MD5 hash of the filename (e.g.: diary_1003.html moves to data/aa/diary_1003.html). This helped for a while, but after a few thousand posts, the time required to extract modification times from all those files became ridiculous (on the order of 30 seconds). Yes, that’s right, 30 seconds to load the front page of the blog. Bad idea.
The dsandler.org pagecache was born. Any page I wanted to speed up would be wrapped by cache_begin() and cache_end() tags; the dynamic cache would test the modification times of the data directories to see if anything had changed, and if so, render the whole page (slowly!), saving the output in a database blob at the same time it was sent to the user. The next user (assuming nothing had changed on disk) would see the cached version, pulled rapidly out of the DB. [The diary-new scripts were updated to force a reload on the main page after a new item was posted, so readers almost never saw the full delay.]
Other things about the site annoyed me to no end. I could write new entries by email, but I had to log in interactively to edit them. I developed a simple Web-based posting form, but it just wrapped around the email interface (!) and so there was no way to know, a priori, what the URL of the new entry would be (it would be assigned at creation time, once the email was received). This made things like Trackback impossible (no way to know the final URL of the referring document). Oh, and the magic pagecache? Only used on the front page and the RSS feed. If you made the mistake of clicking the “N previous entries” link, welcome to Slowtown. (That was just a terrible way to browse the archives, as well.)
In short, weblog technology moved on way beyond my bubble-gum-and-twist-ties implementation. Even my other web projects blew dsandler.org away; the weblog and TV episode guide site I built for Erin is fully database-backed, has a fancy calendar-based archive, etc.
I decided to revamp dsandler.org, top to bottom.
Oh, this was about two years ago. Did I mention that?
I vacillated between writing my own implementation (in Python, natch; PHP’s API is so huge and hideous that it appears to have been assembled by a hundred monkeys, sitting at keyboards, each equipped with the Perl man page and an amphetamine tablet) and pulling something off the shelf. I had had a good experience with b2/cafelog—and ended up using it for an internal weblog at PalmSource—but really wanted to stay away from PHP (and Perl for that matter).
I also wanted to cleave to the simplicity of the original dsandler.org: files in directories. Clearly this led me to Rael Dornfest’s blosxom, and its Python-language cousin, pyblosxom. I tried out blosxom (again, at work) and discovered quickly that, as a side-effect of being simple, it was also simplistic. It just didn’t do a lot. Even after adding a bunch of third-party plugins, it still didn’t really have very many features. I figured I’d have to start writing my own plugins, and so, preferring to write code in Python when I can, I focused fully on pyblosxom.
This summer, I set aside my five or six half-implementations of a new pure-Python weblog package, and set about making pyblosxom work. I was able to build (roughly) the new dsandler.org design you see here without too much trouble, including some code tweaks (so I’d have the right data around in my page templates). I knew I’d have some more hardcore plugin writing to do in order to build some of the fancier features I had imagined (including something as simple as next-day/previous-day links, which are hard in the pyblosxom architecture), but I’d gotten the site to a point where I wanted to fill it with data and make it live. As soon as I wrote a script to import the old dsandler.org entries into pyblosxom, I discovered that I had a new problem.
Actually, it was an old problem.
The blosxom variants have no caching, no by-date indices, no data structures of any kind to reduce the dependency on recursive descent of all files in all directories. I consulted the various (py)blosxom websites; surely there must be other installations out there with thousands of entries. Those sites would show the same sorts of performance problems, and their owners would have found workarounds, right? Apparently not: I was back where I started, waiting thirty seconds for a page load. (Longer, actually.) I might as well have stuck with my own PHP and shell scripts.
It seems that most blosxom users with more than a few dozen posts just render their weblog pages to static HTML! (That is, they re-run the blosxom application after each change, and it combs the filesystem creating static HTML pages for every possible view of every category, date, individual post, etc.) Ridiculous—we’re back to Frontier, now, emitting a pile of HTML and FTPing it to a webserver. No. Fricking. Way.
Deflated, bruised, despairing, I decided that my problem was simple: other people’s software sucks. I spent another few months, banging around with custom Python indexing routines and templating code. But my spare time was getting thinner and thinner (now that I’m back at school), so I did the unthinkable: I went back to PHP.
I took another look at b2, now called WordPress. Matt Mullenweg, unlucky at cards though he may be, seems to have some luck with weblog software: WP has gotten rave reviews, and new installations seem to crop up everywhere.
But it was PHP! But it required MySQL! These things vexed me still.
And damn if it doesn’t work. From a features standpoint, WP includes fifteen different kitchen sinks, but the administrative UI is totally manageable (and the template functions are reasonable, if not always totally consistent) . The third-party developer community is active and prolific, and I quickly found an implementation of almost every feature I had imagined for the site (including next-day/previous-day links). And after a little time with the PHP code, I became pretty comfortable that I’d be able to hack together whatever I needed if I couldn’t find it elsewhere.
Total time to convert an old pyblosxom design into a new WordPress design, including a new article import scheme (thanks to WP’s RSS import this wasn’t too hard, though I did fix a few bugs in import-rss.php), a custom linkrot-curing script (so the old /entries.php?NNN links still work, despite the fact that the new version of each link includes the date and WP-sanitized title, so there’s no direct conversion), full data transfer from dsandler.org, and testing/tweaking: six hours.
That’s a big deal. Matt should be proud of what he’s created, and I am proud—or will be, eventually, honest—to have finally given in and embraced someone else’s weblog software.