waving android

I am currently a software engineer at Google, where as a member of the Android platform team I build frameworks and user interfaces.

The blog here at is mostly historical; you can find more recent posts on .

Music library consolidation—with sane metadata

July 25th, 2005

Dear music-geek lazyweb: How do you deal with the problem of consolidating bulk-ripped CDs with your existing iTunes library? Here’s the scenario:

You have some of your music already ripped to mp3/AAC, but then decide to farm the process out to one of those buck-a-disc services that will mail you back a stack of DVDs containing shiny new FLAC or Apple Lossless files. Now you have to replace the low-quality media in your library (iTunes, for sake of discussion) with the new hotness. Seems easy enough, right? Just look for duplicate albums and replace old with new.

However, you’re really picky about your metadata (artist, album, year, etc.), and you don’t control the metadata that your CD ripping service applies to the tracks they send back. Maybe the newly ripped audio files have the artist “REM” (instead of “R.E.M.” as it should be) or consider the year on compilation tracks to be the year of the compilation (e.g. 2005) instead of the year of the recording (e.g. 1995). These things happen; even the most expensive bulk-CD ripper may differ with you on subtle points such as the correct rendering of sigur rós song titles.

Now it’s a big ugly problem. A piece of software to analyze two libraries (your existing library with “good” metadata, and your new library of freshly-ripped high-quality audio with possibly “bad” metadata) needs to be able to decode the various tag formats (including the tags in the MPEG-4 wrapper around ALAC/AAC audio, which have an undocumented format) and apply these heuristics to figure out what to do. Ideally it would be able to ask the user to confirm a huge merge operation (which would take some time), and of course it would need to be able to actually write the correct tags to the high-quality audio (undocumented formats make this annoying). The end result is a “golden” music library, with the highest quality audio and the most “correct” metadata.

Is there any software “out there” that even addresses some small part of this issue? (iTunes’ “duplicate song” feature is not helpful; it seems to exist for the express purpose of saving disk space by deleting tracks from your “best-of” albums.)

Leave a comment if you’ve got any ideas. (Did I get the jwz lazyweb tone—self-important while at the same time asking for help—right?)

7 responses

  1. Chris Sanders  

    First, I’d recommend that you re-rip your CDs yourself. iTunes stores metadata that you enter the first time it sees a CD, and can apply that (including your esoteric puncuation and/or comments) to subsequent rips (even those that use different encoders). Sometimes it’s smart enough to overwrite the lower bitrate version, other times you have to sort by “Kind” and weed out the old & busted. That’s how I made the switch. It’s a pain in the ass to swap the CDs through all over again, but it’s strangely satisfying.

    comment posted at 3:50 pm on 25 Jul 2005

  2. dsandler  

    First, I’d recommend that you re-rip your CDs yourself.

    But what if you don’t have 50 CDs to re-rip, but 500? 1000? This is why the “other people’s metadata” problem exists at all.

    That aside, you make a very good point (that iTunes will notice that it’s seen this CD before and that you’ve fixed the tags on it already). Although—what if you ripped the CD and made changes to the tracks in your library later? Will iTunes apply those updated tags to the CD when you re-insert it?

    comment posted at 3:58 pm on 25 Jul 2005

  3. dsandler  

    Chris Liscio points out mp4meta.cpp, the result of reverse-engineering on the ALAC tag format. One small piece of the puzzle. (Care to write a SWIG wrapper for it, Chris?)

    comment posted at 4:03 pm on 25 Jul 2005

  4. Rod Begbie  

    Welcome to my world. A large chunk of my job is worrying about the quality and coverage of various music metadata sources, and how that affects user experience in the living room.

    My personal solution has been to live with the MusicBrainz database (http://musicbrainz.org) and it’s associated styleguide. iEatBrainz has been keeping my tags in check for the last year or two, so everything is reasonably consistent already.

    That said, I’m in the middle of writing a whole mess of Python scripts to re-rip my music collection in FLAC and store pointers to a variety of metadata sources with each file, so that if, for example, Amazon gains coverart for an album or MusicBrainz fixes a type in the data, I can grab it and retag my files in one fowl swoop. (In theory — This part isn’t done yet!)

    comment posted at 7:46 pm on 25 Jul 2005

  5. Rod Begbie  

    And folks are sufficiently anal on MB to satisfy all your Sigur Rós dreams: http://musicbrainz.org/album/4b374b09-2e39-47a3-9819-8a0eae21db66.html

    comment posted at 7:48 pm on 25 Jul 2005

  6. dsandler  

    begbie++;

    comment posted at 10:40 pm on 25 Jul 2005

  7. Tom  

    I recommend musicbrainz as well. Best of luck with it.

    comment posted at 4:53 pm on 27 Jul 2005

Add a comment

html help (show)

newer: older: