MP3 tagging
#347 Henry, Sunday, 19 June 2016 10:38 PM (Category: Music)
(Tags: mp3 tag music)

This is a saga about my music library and maintaining it. This is not for everyone, this is my hobby, it gives me pleasure to do this and keep my library maintained, and to write software for it, and write scripts, and fiddle with it. This is about me having fun, my way. Your mileage will undoubtedly vary.

Streaming vs Own Library

I know that most people have abandoned their own music libraries and simply use Spotify or Rhapsody or Apple Music and pay a monthly fee and get access to a reasonable range of music. I hear that Amazon is about to fire up their own streaming library too. That's all good, it's an easy way to get access to good quality music easily, and have zero maintenance.

I don't participate. I have a fine little library of music that spreads across many genres, and is deep in areas that I like. I am particular about some things. For example, I like early Mike Oldfield, especially Ommadawn and Hergest Ridge. I grew up with those two pieces and really love them. I listened to them on CD recently and went WTH? That's not what I remember. And I researched. He remastered them and for my tastes, the new versions sound like completely different pieces of music. I want to listen to what I grew up with, not with what he now considers the definitive version. So I bought the two boxed sets of these two pieces and they contain the new versions and the original versions. I ripped them and can now listen to the original versions, the way my mind hears them. But on the streaming services, I think all you can get is the new versions. That's just one example of why I don't want to rely on the streaming services.

Also bandwidth. My way, my music is on a HDD and there's no wasted bandwidth. This is a downside according to my co-workers who don't want to carry a HDD and manage it and back it up. And they don't care about how much of work's bandwidth they use during the day.

If Spotify lose an artist and have to pull all that artist's music, it doesn't affect me. If Rhapsody has only the last few releases by a band, say Gaye Bykers On Acid, when they had lost their fyre, but don't have their first two releases - Drill Your Own Hole and Stewed To The Gills - that doesn't affect me either. I bought the vinyl and converted them to mp3s myself. Much better music than the later stuff.

So I maintain my own curated library of music.

Library Metadata

But one of the issues of maintaining my own curated library of music, is the metadata - the tags. A library of music has to have accurate metadata so you can find it, sort it, organise it, enjoy it. Music players rely on the tags in the mp3 files to make sense of it all.

So the tags are stored in the mp3 files. The file contains both the music and the metadata. For most people, that is the only location of the metadata, the tags. If they replace their mp3s, say a re-rip for better quality, they have to enter all that data in again. You could use external services to populate the tags for you. I estimate that the public suppliers of track data and tags are pretty good with tags for new rock and pop. Probably 95+% accurate. For older rock and pop, it's really variable. Maybe 70% accurate. For classical music, it's abysmal. So many errors, so much missing data. I care about this stuff, but nearly everyone else doesn't. It's a hobby for me, like collecting stamps. I want the data to be right. Storing the data in the music file and having to re-enter it if the music file changes is not the way I want to go. I want a permanent repository of data that exists outside the music file. And not in a database. I want something simple, text based, and stored with the mp3s so it's portable.

A very long time ago, I collaborated with Richard on this very issue. We went through a number of iterations until we came to a system that we were happy with. In each directory with mp3s, we store a simple text file called track.dat. This contains all the mp3 tag data I could possibly want, in a simple scheme. I have a Perl program called that reads that file and adds the tags to the mp3s. If I replace the mp3s, I just run and it uses the data in track.dat and shoves it into the mp3 so the music players can see it. If I get better data, or fix mistakes in the data, I just run again and again and again and it takes the information in track.dat and stuffs in into the mp3.

The Library

I have my library of music at home, and at work, mostly on Linux machines, and occasionally on a Mac so I can put some music on my iPhone. The prime repository is not a Mac. It is an external hard disk on my Linux desktop. iTunes is not my prime repository, iTunes is just the end of a chain, a conduit for putting the music in my device. I don't listen on my iPhone much any more. I listen mainly on my home and work Linux desktops. So my scheme has to be OS independent; independent of any one music player; simple, text-based for easy editing; and able to work with a lot of data.

I have 123,000 tracks in my library, about 9,000 CDs or LPs or whatever. I have a lot of data. I enjoy working with it, and I enjoy listening to it.

id3v2 and Sort Order

Richard and I developed that scheme in 2005, and we've been using it ever since. We each use it in different ways. It's a great initial design but we've deviated in the application. Shows how good it is that it works for both of us. As the years have gone by, I have improved both the metadata I store in track.dat, and the tools I use to take the metadata out of track.dat and insert it into the mp3s.

I'm on Linux so I use the id3v2 command line utility to put the tags into the mp3s. This utility dates back quite some years, and hasn't been updated in a while. It worked pretty well for me till about 2010, and then the tagging scheme changed a little. I think it handles mp3 tags up to version 2.3, but not 2.4 and beyond. Until last week, I continued to use it.

Over the years, I have only ever had one problem and that was the artist and composer name order. The way I've been doing it, Tim Buckley appears under T for Tim. I would prefer he appear under B for Buckley. Similarly, Claude Debussy appears under C for Claude, rather than D for Debussy. If you want to find Debussy's music, you have to know his first name. I always forget the first names of the classical composers. I want the artists and composers to appear in last name order.

MP3 tags are stored with four-character tags. For example, the artist name (really performer name = P) is stored with the TPE1 tag, and the composer name is stored with the TCOM tag. Modern music players do accept tags for sort order. Sort-artist is stored with TSOP, and sort-composer sort is stored with TCOS. So to achive the desired result for Tim Buckley and Claude Debussy, I would have tags like this:

TPE1=Tim Buckley
TSOP=Buckley, Tim
TCOM=Claude Debussy
TCOS=Debussy, Claude

and the normal name is displayed, but the sort version is used for sorting. Works really nicely. Except, the utility id3v2 does not accept the new tags. I can't put them into the mp3.

There's also one downside to the new scheme. Most modern music players have the concept of an ALBUM-ARTIST. This helps with compilation cds. The album artist is stored with the tag TPE2 and the sortable version in tag TSO2. This is really unfortunate for classical music because traditionally TPE2 was for the orchestra and TPE3 was for the conductor. But now that TPE2 has been co-opted as the album artist, you can't store the orchestra in the mp3. I was bummed out when I found this.

So I would like to use the new tags that allow sort artist and sort composer, but the utility id3v2 does not support them. The source code is available, and I have worked with it. I cannot make the modifications to allow the newer tags, I am not clever enough a programmer.


And that is where matters stood until recently. As part of an effort to find an alternative, I installed a Python module called mutagen which looked really promising. I played with it, and it's neat. I could write little utilities to interrogate mp3s and work with them. And it's all Python. Love it.

Then I found that mutagen comes with a utility called mid3v2 which is a drop-in replacement for id3v2. Same switches, same everything. Presumably the name is something like Mutagen id3v2. It's a Python script, so I have a better chance of understanding it. I've been playing with it, and it works and it's now my base tagging utility. All I had to do to get it into use was change "id3v2" in my to mid3v2. The irony of calling a Python program with a Perl program has not escaped me. Best of all, mid3v2 supports the sort order tags. So now I can do everything I want to do with tagging.

And now I face a new project - a major overhaul of every track.dat file, adding new tags, improving the data to take advantage of the new features, cleaning up some earlier design decisions. And then re-tagging all the mp3s. This is going to be fun.

I have over 10,000 track.dat files. Luckily, I do not have to edit every one by hand. I already have utilities - both shell and Perl - that read all the track.dat and search for one tag and value and replace it with another tag and value. It was short work to enhance that so the search for one tag and value and append another.

So I can say "search for artist = Tim Buckley" and add the line sort-artist = Buckley, Tim" on the next line, and it will read through all 10,000 files and edit them. I backed up my data first. But it works cleanly and nicely and takes about three seconds to read and edit all 10,000 track.dat files. I have been cleaning up composers and artists, and having to make decisions. How do I determine the sort order for Nusrat Fateh Ali Khan? What is the last name of Ralph Vaughan Williams? What about Orlande de Lassus? Does that go under D or L? For me this is fun.

There is one other advantage to mid3v2. It's fast. id3v2 was quite slow on external USB hard drives. mid3v2 is not slow at all.


So there we are. I have a new project to work on. At the end, I will have a nicer music library that is better organised. I will have learnt a lot in the meantime, and it will give me the opportunity to write new utilities and shell scripts and Perl programs and Python programs. For me, this is fun.