On Twitter as a research problem (dsandler.org)

On Twitter as a research problem

December 8th, 2008

Recently, I’ve spent quite a bit of time thinking about _microblogging_, exemplified of course by the very popular [Twitter](http://twitter.com) service. While I finish preparing a technical report describing the [FETHR](http://brdfdr.com) system we’ve developed[^2], I thought I’d take a few minutes to discuss why I find this topic compelling.

[^2]: The work is also currently under submission to a competitive conference; a technical report is useful in this case to gain additional feedback from the broader microblogging community and to spur experimentation in the wild.

The microblog is something of an odd bird, defying easy classification or even explanation (as evidenced by the many [attempts](http://www.commoncraft.com/Twitter) to articulate its purpose and properties).
[Originally](http://flickr.com/photos/jackdorsey/182613360/) [conceived](http://www.pbs.org/mediashift/2007/05/twitter-founders-thrive-on-micro-blogging-constraints137.html) as a way for strongly-connected friends[^3] to keep track of one another’s whereabouts and present activities using SMS, it’s become a kind of ultra-lightweight conversational tool that doesn’t map exactly onto any existing publishing or communication system.
Twitter’s single prompt—“What are you doing????—is almost vestigial at this point; it is evolving into a *micropublishing* platform, and as such has become an important and interesting topic of research.

[^3]: Presumably those in urban areas where “I’m at the coffeehouse??? is an actionable piece of data. Some still use Twitter for this kind of lazy real-world rendezvous, but most have adopted a more blog-like approach to the system. One of the noteworthy things about microblogging is that it supports all these modes of interaction simultaneously.

Microblogging has already had impact. We can speculate—from the amount of [news](http://www.cnn.com/2008/TECH/04/25/twitter.buck/index.html) and [media](http://news.bbc.co.uk/1/hi/technology/7287536.stm) [coverage](http://money.cnn.com/2008/08/06/technology/true_meaning_of_twitter_lashinsky.fortune/) that the phenomenon has [earned](http://topics.nytimes.com/top/reference/timestopics/subjects/t/twitter/index.html)[^201]—that it is poised to take its place next to the blog as a prominent method of publishing and interacting online.
Many now rely on services like Twitter for business and personal interaction; beyond the “ambient awareness” of physically distant friends and neighbors, microblogging now finds use in business networking, [customer service](http://twitter.com/griffintech), [national politics](http://twitter.com/BarackObama), [journalism](http://venturebeat.com/2008/09/08/new-cnn-show-pushes-the-limits-of-twitter-literally/), and general lazyweb[^1]-style requests. Microblogging has unquestionably become a part of the everyday lives of active users, and in extreme situations—such as the recent Sichuan earthquake—it can literally be a lifeline.[^200]
Encouraging and improving these services is therefore an important and valuable goal.

[^201]: The _New York Times_ seems particularly smitten.

[^200]: Could Twitter actually save lives? Andy Carvin [speculates](http://www.andycarvin.com/archives/2007/03/can_twitter_save_lives.html) that with a few sophisticated group features it might be a tool for mobilizing relief efforts. The work I’m doing would help to enable this.

[^1]: **la•zy•web** *n.* The collected wisdom of millions of internet users, mythical solver of problems cosmic and quotidian. First supplicated by name in 2005 or thereabouts by [@jwz](http://twitter.com/jwz).

Plot of Twitter users, ranked by number of subscribers (followers). Unpublished, 2008.

As a computer scientist, I’m interested in microblogging systems in part because of their unique properties.
They are remarkably spam-free, mostly due to the way in which users explicitly select those senders whose messages they wish to see (by “following??? them).
The many subscription links between microbloggers forms an interesting social graph: can we put this network to use in some way?[^99]
Microblogging is also an unusual mode of communication, falling somewhere between blogs, chat, IM and BBSes in terms of how, when, and to whom messages are distributed.
Even the way users consume messages is peculiar: a constant stream of updates commingling their own messages with those from their friends, ensuring that no two users have the same view of the system.[^98]

[^98]: In this way it might be said to be most similar to Facebook’s News Feed, introduced to early controversy, due in large part to the dramatically—for some, shockingly—increased importance and persistence of formerly ephemeral bits of Facebook data like the personal status.

[^99]: Some examples taken from the FETHR paper: abuse/spam detection; recommendations/introductions; and, of course, update distribution.

Twitter's fail whale, out over the open sea.

Microblogging is every bit as compelling as research because of its **limitations**—specifically, the limitations of the flagship microblog service, Twitter.
First, it is a large but entirely *centralized* system; there are currently a few million registered user accounts[^100], of which maybe half a million are active[^101]. How much larger can it get?
The folks at Twitter are, by all accounts, barely keeping up with their own success; in fact, scaling problems have been at various times the subject of much public frustration.
As a result, Twitter is also *fragile;* its users are all unavoidably bound to Twitter’s robustness and reliability.
When Twitter goes down, service is completely interrupted for everyone.
It is a *closed* system, by which I do not necessarily mean that the source code is not available, but rather, the ways in which the system *functions* are not up for debate or (easy) amendment by the community.[^86]
Twitter, Inc. is a dictator, albeit a benevolent one, and while users may wish to switch to another service—and there are plenty of Twitter-alikes to choose from, many promising more advanced features or better reliability—powerful [network effects](http://en.wikipedia.org/wiki/Network_effect) prevent users from leaving. After all, everyone you know is already on Twitter, and they’re *not* on, for example, Pownce.[^87]

[^87]: Not least because it’s been bought and [shut down](http://blog.pownce.com/2008/12/01/goodbye-pownce-hello-six-apart/).

[^86]:
The [Twitter API](http://twitter.com/help/api) allows third parties to develop software that talks to the existing Twitter service, including desktop clients, search engines, and, yes, research code such as my own. What’s not possible at this point is to change the way Twitter works; for example, external developers can’t turn off Twitter’s use of TinyURL to shorten URLs (causing an unfortunate fate-sharing between Twitter and TinyURL, which has suffered its own reliability problems). More to the point, the Twitter microblogging network is isolated: there’s no way for a third party to offer Twitter users the ability to follow users of other systems or vice versa.

[^100]: According to [TwitDir](http://twitdir.com), which is having load issues as I write this; TwitterFacts [reports](http://twitterfacts.blogspot.com/2008/10/barackobama-followers.html) TwitDir’s estimation of the Twitter community as of October to be just over 3 million.

[^101]: Based on our measurements during a three-week period in September, during which we observed 4,917,042 public messages from 472,735 users.

Timeline entanglement in FETHR. Unpublished, 2008.

The potential is certainly there for Twitter to become, as its founders style it, a [“communication utility???](http://blog.twitter.com/2008/06/welcoming-bijan-and-jeff.html); whether or not Twitter can actually achieve this aim depends in large part on its technical evolution. That’s where my recent work fits in the microblogging timeline; watch this space.

dsandler.org/wp

On Twitter as a research problem

Add a comment