Extracting audio from DVDs
#72 Henry, Thursday, 08 July 2010 4:19 PM (Category: Music)
(Tags: music bollywood dvd)

I watch a lot of Bollywood films and I love the song-and-dance routines. I find it hard to get the soundtracks on CD, so I extract the music off the DVD. I do this under Linux using a number of different tools.

First, I use mplayer on the command line to extract the music as wav. Then I use Audacity to extract the music parts. And then I use my regular tools to convert to mp3 and ogg.

To get the music off the DVD, I have a number of techniques and a number of scripts, depending on how the DVD is put together. I have a way of doing it that works, and I'm going to describe it. I think it works this way, but my knowledge is not exact, it's been acquired through erratic trial and error. I could be wrong, but this works for me.

First, you have to work out the content of a DVD. If you mount a DVD, it will mount as a data DVD and it contains two directories - AUDIO_TS and VIDEO_TS. AUDIO_TS is usually empty. I have seen stuff in it, but it doesn't concern me. The contents of VIDEO_TS is what gets played when you put the DVD into a player and press Play.

In VIDEO_TS there are a bunch of files. The ones that contain the data we need are the ones with the VOB extension. Here's an example of one film's VOB files:

VTS_01_1.VOB
VTS_02_1.VOB
VTS_03_0.VOB
VTS_03_1.VOB
VTS_03_2.VOB
VTS_03_3.VOB
VTS_03_4.VOB
VTS_03_5.VOB
VTS_04_1.VOB
VTS_05_1.VOB
VTS_06_1.VOB
VTS_07_1.VOB

The filename has two numbers. The first number is the title, and the second is the chapter. I think. So this DVD has 7 titles, and titles 1, 2, 4, 5, 6 and 7 have 1 chapter each, but title 3 has 5 chapters. Think of a "title" as a section of video. One "title" could be the whole movie. Another could be a promo, another a special feature like "The Making Of The Movie", another could be a block of deleted scenes. The "chapter" is a portion of a title. For example, the "title" could be the deleted scenes, and the "chapters" within that title would be each of the deleted scenes. Some titles are very small and they relate to linking material on the DVD, Some are medium size and they are promos for other films or special features. Some are huge and they are the movie. You will often see two versions of the movie - one widescreen and one fullscreen - and they will be two "titles".

You have to work out which title is the actual movie you want. Once you look at the VOBs, you can usually make a pretty good guess which one is the movie. They are the biggest ones. If there are two, one will be widescreen and one will be fullscreen, and you will have to work with both to determine which is which. But if all you want is the audio, it doesn't matter.

So, if I look at the DVD whose VOB files I listed above, and see that title 2 is the biggest file, then that's a pretty good indication that it's the movie. I will extract the audio for that whole title and dump it to a wav file. Then I will load that into Audacity and break the music tracks out one by one.

I have a script that will help me extract the audio for the entire title - get_audio_title.sh.

#!/bin/sh
TITLE=$1
mplayer 
  -vc null 
  -vo null 
  -ao pcm:fast:waveheader:file=title_${TITLE}.wav 
  -cache 8192 
  -af resample=44100:0:0 
  dvd://${TITLE}

I run this with "get_audio_title.sh 2" and mplayer will read title number 2, extract the audio as wav, resample it to cd quality, and save it with the filename "title_2.wav". Then I load this into Audacity and work with it. While it's extracting, you will see complaints about speed and caching. Ignore them. It's normal for audio dumping.

If I know that I want multiple specific titles, say I want the two versions of the movie, and the special features, I have another script and I modify the TITLES line for each CD once I know more about it.

#!/bin/sh
TITLES="2 3 5 6 7"
for title in $TITLES
do
  echo "Title $title"
  mplayer 
      -vc null 
      -vo null 
      -ao pcm:fast:waveheader:file=title_${title}.wav 
      -cache 8192 
      -af resample=44100:0:0 
      dvd://${title}
done

This will extract the audio for each title, resample to cd quality, and save them with filenames like title_2.wav, title_5.wav, etc.

Sometimes you see multiple chapters like in title 3 in that example. On my Bollywood films, they often have a section where they have extended versions of the best songs in the film. That section is one title, and each song inside is a chapter. I have another script that extracts these individual chapters, but because it changes per film, I have to edit the script for each DVD.

#!/bin/sh
TITLE="3"
CHAPTERS="1 2 3 4 5"
for chapter in $CHAPTERS
do
  echo "Title $TITLE - chapter $chapter"
  mplayer 
      -vc null 
      -vo null 
      -ao pcm:fast:waveheader:file=track_${TITLE}_${chapter}.wav 
      -cache 8192 
      -af resample=44100:0:0 
      -chapter ${chapter}-${chapter} 
      dvd://${TITLE}
done

This script will extract the individual chapters out of title 3, resample them to cd quality, and save them with filenames like track_3_1.wav, track_3_2.wav, etc.

Sometimes this will not work very well. On most DVDs I can extract the chapters okay, but I got one DVD from a recent Taste Of India festival. It was a pirate DVD that contained 100 video clips of Aishwarya Rai singing from her films. Title 1 was the main menu, then there were titles 2 to 11 and each title had exactly 10 chapters. 10 titles x 10 chapters = 100 video clips. I extracted by chapter, but the times were off (I suspect the DVD was poorly mastered) and I lost chunks of time at the start and end of each chapter. So I had to extract it by title, load it into Audacity and break each title manually into the 10 chapters. That took some time, but it sure was worth it.

But anyway, one way or another, once I have the contents of the DVD converted to wav, either directly through mplayer or by breaking it with Audacity, I can then convert to mp3 or ogg and then I have the soundtrack to the film to carry around with me and enjoy the way I want to.

0 comments