Todd wonders why I haven’t mentioned Amazon’s fulltext searching of their books.
Todd wonders why I haven’t
mentioned Amazon’s fulltext
searching of their books. Here’s what I told him:
Re: Amazon’s searching: I’m a little less impressed with the whole
arduous book-scanning endeavor, because, frankly, I doubt they scanned a
single page. I figure almost all the books you can search, all the
books that Amazon’s selling, are in current print, which means that
their publishers are rolling paper pulp under rubber offset printing
rollers right now. The rubber is transferred ink by a lithography plate
wrapped around another roller; the lithoplate was etched by exposing it
to UV light (could be visible light; this process varies depending on
the chemicals used on the surface of the plate) underneath a
photonegative of the page printed on celluloid; the negative was
generated by a digital printer (dye, or ink, or laser), from digital
prepress data (PostScript, generated by Quark or TeX or whatever).So, basically, the publishers have the text of all these books in
digital form already. It’s just a matter of Amazon getting a hold of
the files, and this falls under the umbrella of business development,
which is truly a terrifying and magical force. (There’s some
postproduction required to extract the text, build the search engine,
blah blah, but this is more straightforward. You can license or buy
technology to cover almost all of that.)I have to admit that the public availability of these texts is a really
interesting development; the corpus linguist in me, for example, is
delighted at the opportunity to analyze so much text in an automated
fashion.Except, of course, you’re not allowed to do that. So, you know, I’m
less excited. In fact, as it turns out, I really don’t have much use
for the service, in its current form, at all. So I haven’t used the
service, yet.