DVD Ripping and Encoding – some success achieved

Up till now, I have been ripping and encoding DVDs on my Macs. I use Mac The Ripper to rip the DVD and store the entire thing on disk in a VIDEO_TS directory. Then I use HandBrake to encode those VIDEO_TS directories to single files in a mk4 format, and for the foreign films I get it to include subtitles.

But my Mac is worn out and the DVD drive will no longer work. Mac The Ripper has disappeared. The Mac way does not suit my workflow. I wanted to try and move the process to my Linux Slackware desktop. Last time I tried this was with Slackware 13.0 and I failed at installing dvdrip. Now I use Slackware 13.37 and I had hopes that I could get something installed.

I struggled to install dvdrip, and eventually I got it to install. Unfortunately it will not rip my DVDs. There is some mismatch or error and it will not work. I have abadoned dvdrip for the time being.

I discovered a different path and have gone down it.

I now use a number of tools to do my ripping.

I use vobcopy to rip a DVD and create a VIDEO_TS directory with all the components of a DVD in it. I mount the DVD first, not sure why I have to do this, but I mount it, then run vobcopy. Like this:

mount /mnt/cdrom
vobcopy -v -m -i /mnt/cdrom -o /data/Movies

This works exactly like Mac The Ripper does on my Macs, except there are some minor problems. I have only tried it with Region 1 DVDs so far, and don’t know how it will handle other regions yet. I have tried it with foreign language films (Hong Kong Hustle, Curse of the Golden Flower, House of Flying Daggers) and it will get part way through the DVD and then lock up. These are still Region 1 DVDs, but they are foreign language with subtitles. On the other hand, I have Watchmen and it has subtitles but does not lock up. There is something strange with this that I do not understand.

The second part of the operation is to create a mk4 or mp4 file, some small encoded file with the just the essential movie in it that I can play on a laptop. The first step is to determine what title on the DVD is the actual movie. I use dvdrip to view the VIDEO_TS directory. It selects the title it thinks is the movie, probably by the length and the number of frames or chapters. I can play it from dvdrip to make sure I have the right title. Once I am sure, I quite DVD and move to the next tool.

I installed HandBrake. This gives me a command line version called HandBrakeCLI. There are instructions for the command line version of Handbrake. The title number I discovered in dvdrip goes in switch -t. For my first cut, I am using this command line to create a mp4 file:

HandBrakeCLI -v --preset=Normal -t1 -i /data/Movies/BUBBA_HO_TEP/VIDEO_TS -o ~/Bubba_Ho_Tep.mp4

Just like on the Mac, HandBrake takes time to do this encoding. I haven’t mastered subtitles yet, but it’s supposed to handle them. The end result is very nice when played back with mplayer, but xine has a really hard time with it. Lots of playback errors, pixellation, stuttering. I will have to experiment with the settings to get a format that is rock solid.

So, despite major ups and downs with installing software, I now have a method of ripping and encoding most of my DVDs on my Linux Slackware desktop. It’s a good start. I will keep working at it, and try and solve the subtitles, and try and solve the foreign language ripping problem.

Finalising a dvdrip installation on Slackware 13.37

I resumed my effort to install dvdrip on Slackware 13.37 using sbopkg.

The problem was that during the creation of the package, it was failing because it could not find any man pages.

Appending installation info to /tmp/SBo/package-dvdrip/usr/lib/perl5/perllocal.pod
mv: cannot stat `/tmp/SBo/package-dvdrip/usr/share/man': No such file or directory

So it’s time to leave sbopkg and do it manually. I went to slackbuilds.org and searched for dvdrip, downloaded the source tarball – dvdrip-0.98.11.tar.gz – and the slackbuild tarball – dvdrip.tar.gz – and put them together. Same result, but I could look inside dvdrip.SlackBuild and see what was happening. It wants to copy man pages from the source tarball, but there aren’t any in there. I commented out that section, and tried it again. Success. It finished and it created a package to install. I installed it. Then I ran dvdrip.

Well, would you look at that. I get a brief flash of a splash screen and then a dvdrip window ready for action. And there’s a very nice feature – Debug -> Check Dependencies. I run this and get this screen:

[dvdrip dependency screen]

There are three issues:

  • rar is a worry, because I’ve got version 4.01 installed and they want a maximum of 2.99. I read the dvdrip notes, and they say version 3 and up will not work, and they provide a version of rar that does. I’ll uninstall the current version of rar, and install the one that they recommend.
  • I don’t care about mjpegtools because I have no need of VCD or SVCD encoding.
  • fping bothers me because I did install it, and dvdrip is not recognising it.

So fping needs work. I did install it, but it’s installed in /usr/sbin and a regular user does not have access to it. I set up a symbolic link:

ln -s /usr/sbin/fping /usr/bin/fping

and tried it:

This program can only be run by root, or it must be setuid root.

Hmm. Okay, I’ll accept this challenge and the security risk.

chmod u+s /usr/sbin/fping

and there we go. dvdrip can see it and it’s okay. It’s not really important, it’s there for clustering and I have no need to rip dvds and encode them over a cluster of computers. I don’t have a cluster of computers.

And now, we have no real problems with dependencies:

[dvdrip good dependencies]

Now to see if I can rip a DVD. I put in a DVD, I create a new project, and wow, this looks really nice. I get things set up, read the table of contents, it automatically selects what it thinks is the real movie amongst all the junk, and it’s right. Then I hit Rip It. This is what I get.

[dvdrip transcode problems]

I can play that part of the DVD fine. I can look at it, but I can’t rip it. More research necessary.

Okay, that’s pretty sad. This appears to be a known problem, and no-one has figured out what it is. There are heaps of links to forums about it.

But in my reading of these forums, I came across mention of vobcopy. I installed it, and played with it, and it does part of what I want. If I use a command like this:

mount /mnt/cdrom
vobcopy -i /mnt/cdrom -m -o ~/Movies

Then under ~/Movies it will create a directory named after the DVD (eg WATCHMEN), and under that it will create the VIDEO_TS directory, and then in there it will rip the entire DVD and create the VOB and BUP and IFO files. This is great. This is the step that I did on my Mac with Mac The Ripper. From this directory, I can recreate the DVD. I can also play the DVD using xine or mplayer, and I might even be able to feed it to dvdrip and just use dvdrip for the encoding to other formats. More experimentation needs to be done.

Trying to rip DVDs on Slackware Linux

I have failed to install dvdrip on Slackware 13.37. It’s a little failure and I might be able to get past it, but I was told that Handbrake works on Linux too. I’ve been using Handbrake on the Mac for a while and it’s quite nice, so I thought I would try to install it before going back to figuring out the problems with dvdrip.

So I fired up sbopkg and found handbrake and started the process. Interestingly, Handbrake requires a lot of libraries but these are all included in the Handbrake sbopkg package, and they get downloaded and compiled as well. That’s a neat feature. All these dependencies compiled just fine, which did surprise me. I see a lot of warnings in the compilations, and that’s not something I like to see. In my own coding, I aim for no warnings. I want the code squeaky clean. So seeing hundreds of warnings for silly things is disconcerting.

Unfortunately, there are some dependencies that handbrake expects that are not supplied, and the sbopkg compilation stops. It needed webkit, and that failed because icu4c wasn’t available. I installed icu4c successfully. Then webkit needed libsoup. I installed that okay. Then Webkit failed on a Perl library – Switch.pm.

Can't locate Switch.pm in @INC

I try and use cpan to install Switch.pm, but that leads me down to a deeper mess. Switch.pm needs Filter::Util::Call and Text::Balanced. They both install successfully. Switch.pm fails two tests and cpan won’t let me install it, unless I force it. I force it. Then I try webkit again. Good grief, webkit takes a long time to compile. I went to a long lunch, I came back, and an hour after that, it finished. Fantastic. Time to try and install Handbrake again. I start it running and go do things, and eventually it finishes successfully. Wonderful.

But when I went to run it, all I see is HandBrakeCLI. What? No GUI version? Did I miss something? More research needed.

Command line tools examples

Sometimes man pages don’t have examples. Sometimes you don’t care about 400 optional switches, you just want to see an example of a common way of running a tool. ExampleNow.com gives these examples. Wonderful.

Trying to install dvdrip on Slackware again

I want to install dvdrip on my everyday Slackware desktop. Last time I tried this, I failed miserably, but that was with Slackware 13.0. Now we are at Slackware 13.37 and I thought things might have improved, so I tried again.

First of all, I went to the dvdrip install page and looked at all the dependencies. That’s one hell of a list.

I started with the Perl modules, and used CPAN. They all installed easily.

Then I started on the command line dependencies. I had to install some extra stuff, and do it in a specific order to get them all installed. I used sbopkg to install all these. This is what I installed, in the order I did it, and the name sbopkg requires.

  • x264
  • ffmpeg
  • transcode
  • xvid4conf
  • lsdvd
  • fping
  • ogmtools
  • subtitleripper
  • rar

transcode and ffmpeg were the ones that failed spectacularly last time, but this time they installed nicely. subtitle2pgm used to be included with transcode, but now it’s on its own, and you install as part of subtitleripper. Hal appears to be already installed. IMagemagick, mplayer and xine are already installed.

I failed on one package – mjpegtools. It requires a linux kernel header – linux/videodev.h – and apparently, support for v4l has been dropped from the kernel lately. So I can’t install this package. It’s for VCD and SVCD support, which I don’t think I need, and it’s an optional package, so I ignored it.

Getting this far and succeeding with all the important packages that dvdrip needs was a very positive step. I did not get this far last time. This gave me great hope that dvdrip would install.

Then I used sbopkg to try to install dvdrip. It failed. It was unable to create a man directory – /tmp/SBo/package-dvdrip/usr/share/man. I tried manually creating the directory and trying to continue the installation from that point, or restarting it, but it must delete the directories first and then recreate them (forgetting to create man) each time.

And that’s where I am stuck now. I think I will download the sbo package and modify it and fix it if I can understand it. I’ll try that tomorrow.

I’m so close to installing it.

If I can’t. my alternative is to find a distribution that has dvdrip installed by default, and install that distro inside VirtualBox. But that’s a last resort.

Troubles with CentOS

At work, we have a number of small servers set up for specific tasks. One of them is being upgraded. We’re building a new one with everything on it, and when it’s ready, we will switch it in. The old one is running “Red Hat Linux release 9 (Shrike)” so it’s pretty old. The new one we are building has been installed with “CentOS release 5.6 (Final)”. So far it’s been a disaster setting the new one up.

There are two main problems.

First, the IT guy who installed it, who does all these installations and loves CentOS, does a minimal install. So much stuff I need is missing right from the start. The libxml2 development libraries. Midnight Commander. After the last time I installed our software on one of these CentOS boxes, I have a list of the missing software and I give it to him and ask to have it installed. He does. But this box has to do a lot more work and it needs a lot of stuff that I consider to be normal software on a Linux server, but he considers to be “extras”. I can’t even set up a cron job without going in as root first and adding my user to /etc/cron.allow.

This box has to accept email on specific domains, pass them to two users, get procmail to process the emails and pass them to two applications which parse them, do database lookups, process them and trigger a range of actions, and send replies back. And the email has to be spam checked. We’ve used sendmail and spamassasin quite successfully up till now. Last time I set this system up, it took me about 6 hours from scratch, most of it spent installing spamassassin. I have taken about 6 days so far on this task, and it’s still not done. I’m waiting for our IT guy to work out how to get the applications to run.

I started configuring it, and kept running into problems. I have to configure sendmail. Oh, you want to configure sendmail? You’ll need the sendmail-cf package installed. WTF? This is a core part of a server, in my opinion, but no, here it’s an optional extra. I start installing spamassassin, and all the Perl modules it needs. Fail. Some of them have C++ components, and oh dear, the C++ compiler component of gcc is not installed. I need the gcc-c++ component, so I have to ask for it to be installed, and wait. After that, it went okay. But then tying sendmail and spamassassin together got difficult.

libmilter not installed. Oh, you need the sendmail-devel package for that. Eventually I get sendmail configured, spamassassin installed, libmilter installed, spamass-milter downloaded and installed, startup and shutdown scripts created and installed, and it all works. I install the applications that need to run at the end of the chain, and I test them manually and they work fine. I configure procmail for the two users, set the whole chain up and do the final tests. This involves telnetting to the box on port 25 and manually running through SMTP transactions to trigger the whole thing.

Then I discovered the second big problem. CentOS has a security mode where you can specify with very fine granularity what programs can and can’t do.

My first test showed this problem. sendmail accepted the email, passed it successfully through spamassassin, handed it to the user via procmail and procmail tried to pass it to the application. I got this in the procmail logs:

/usr/local/epager/bin/epager: line 4: syntax error near unexpected token `('
/usr/local/epager/bin/epager: line 4: `use POSIX qw (strftime);'

The offending line was a simple use POSIX. At first I didn’t know what was going on. I removed the POSIX module and wrote a workaround for strftime. This time it treated every line of Perl as something bad. It was as if it was treating the Perl program as a shell script and failing on every line.

/usr/local/epager/bin/epager: line 13: =: command not found
/usr/local/epager/bin/epager: line 16: =: command not found
/usr/local/epager/bin/epager: line 17: =: command not found
/usr/local/epager/bin/epager: line 18: =: command not found
/usr/local/epager/bin/epager: line 19: =: command not found
/usr/local/epager/bin/epager: line 20: =: command not found
/usr/local/epager/bin/epager: line 21: =: command not found
/usr/local/epager/bin/epager: line 22: =: command not found

I spent a lot of time trying all sorts of things. Days were lost. Eventually I realised that this was not a problem with my code. It runs perfectly if I am at the command line and run it manually. Procmail was not doing it right. I went and spoke to the IT guy and described the scenario. He told me about the fabulous new security mode that CentOS has. He has to add some clauses to the security configuration somewhere to allow procmail to run my apps. He can’t work out what I am trying to do, so he gets me to run as root

/usr/sbin/setenforce 0

and run my tests again while he looks at some security logs and works out what he needs to do. We iterate through this many times and eventually we get to the point where one app will run mostly, but cannot create a new log file (although it can append to existing ones), and the second app will run but do absolutely nothing. Note that these run perfectly if you run them at the command line. Running them from procmail is not successful. And that’s where we are right now.

We have to have the logs. We occasionally get subpoenaed and have to get all sorts of details about connections and emails. Without the logs, we can’t do what needs to be done.

CentOS is really secure, so secure that the apps that need to run will not run. I’ve currently spent six days on this task that should have taken 6 hours. I’m not happy, my bosses are not happy. I just have to wait till he can figure it out. I am asking for this new excessive security mode to be removed, but I keep getting told “I’ll work it out”.

The only good side is that every thing that I have had to do has been documented, so next time I have to set something like this up, I will be able to do it in 6 hours.

Another Mac OS X tool of use – textutil

OS X Daily has another neat tip about a Mac tool of great use – textutil.

SEO sleepless nights

I’ve spent several almost sleepless nights now, completely overhauling all the websites I have. I’ve cleaned them up, upgraded the HTML, rewritten PHP, added all sorts of things for SEO, and finally I am fairly happy. I have deleted some, I have resurrected some. I still have a couple of small packages to upgrade – Serendipity and SquirrelMail – and one very old website to completely rewrite to PHP. But finally I have a grip on it all.

Probably by the end of the year, I hope to see Google’s disapproval of me lifted.

Validating HTML and SEO

I have a number of websites for service groups, especially one for Anne. I was doing some Google fiddling and got alarmed by a number of things I saw. I found my way to HTML validators.

These are websites where you can give them a URL and they will check the URL and report on your failures. The main one I used was the W3C Validator. That opened up a whole mess of problems. I thought my HTML was pretty good until I fed it into that. I had to get the DOCTYPE right, and then add an xmlns clause to my html tag. Then make everything validate to XHTML, so singleton tags had to have the closing slash at the end, and some things did not fit in. Some things puzzled me, like not being able to put a ul inside a p. But I went through all my code, and I cleaned up everything that was suggested, until everything got the green tick of approval. I felt virtuous.

Then I came to this other validator page – The WDG HTML Validator. This one will try and spider through your whole site instead of just one page, and it found a bunch more problems. On my photos page, there was a markup error if the matrix was filled exactly, with no empty slots. A off-by-one error. In the calendar, it found an empty row when the 1st of the month started on a Sunday. I ended up with

<tr></tr>

and this is not allowed in XHTML. I have been trying to debug this one, but it’s really tricky. I will have to spend more time to fix it.

So I started working to validate just the HTML. Then I noticed that there were suggestions for improvements that would help search engines understand your data better. I found myself looking at validating the web pages for SEO. I came across validator webpages, and started following their suggestions.

I added the appropriate meta tags, including the keywords. I know that not much, if any, credence is given to keywords these days by the search engines, but it helps a little if the keywords match the content. I set up methods of putting specific keywords per page. I’m not writing raw HTML here, I am writing in PHP and have functions to handle common elements of each page. I ended up with meta tags like this:

<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<meta name="description" content="gross keyword,less gross keyword,refined keyword"/>
<meta name="keywords" content="$keywords"/>
<meta name="robots" content="index,follow"/>
<meta name="author" content="Henry Griggs"/>
<meta name="copyright" content="2011, Henry Griggs"/>

All these were suggested to me by several SEO validator websites. I used these ones the most:

I also had to ensure that page had a h1 tag, that matched the keywords and title. Lots of sensible little things that made the website tight.

Then I learned about duplicate content. This was a problem. I have both thisdomain.com and thisdomain.org, and I point both those and www.thisdomain.com and www.thisdomain.org all at the same site. So every page can be referenced by four different URLs. That’s duplicate content, and it had to be fixed. I wasn’t sure how to do it initially. There seemed to be two ways to do it. One was with 301 redirects (what?) and the other was with a canonical meta tag. I was puzzled about how to approach it, so I decided to make my ignorance benefit me.

Stackoverflow.com has branched off into a whole series of satellite sites. One is for web development – Pro Webmasters. So I asked my question about duplicate content (and gained some points) and got a good answer and acknowledged it as a good answer (got some more points). Then I expanded from the answer and fleshed it out to cover all my duplicate problems.

First, I had to decide which one was going to the definitive URL – thisdomain.com, www.thisdomain.com, thisdomain.org or www.thisdomain.org. I decided on thisdomain.org.

Then, I changed Apache and my virtual hosts configuration so I had a different directory for thisdomain.com and thisdomain.org. Previously, they were all going to the same directory. The vhosts setup was like this:

<VirtualHost *:80>
    ServerName www.thisdomain.org
    ServerAlias thisdomain.org
    DocumentRoot "/htdocs/thisdomain/org"
    ScriptAlias /cgi-bin/ "/htdocs/thisdomain/org/cgi-bin/"
    <directory /htdocs/thisdomain/org>
      allowoverride all
      order allow,deny
      allow from all
    </directory>
</VirtualHost>

<VirtualHost *:80>
    ServerName www.thisdomain.com
    ServerAlias thisdomain.com
    DocumentRoot "/htdocs/thisdomain/com"
    ScriptAlias /cgi-bin/ "/htdocs/thisdomain/com/cgi-bin/"
    <directory /htdocs/thisdomain/com>
      allowoverride all
      order allow,deny
      allow from all
    </directory>
</VirtualHost>

I left the directory for .org alone and created a new directory for the .com. In the directory for .com, I created a .htaccess file. I set it up per the answer on the webmasters site, like this:

Options +FollowSymlinks
RewriteEngine on
RewriteCond %{http_host} ^thisdomain.com [NC]
RewriteRule ^(.*)$ http://thisdomain.org/$1 [R=301,NC,L]

I restarted Apache and tested it. Great. If I entered the URL thisdomain.com/about.php, it would just bring up thisdomain.org/about.php. It worked great. I was really pleased with it. Then I used Google to bring up some search results and make sure they pointed to the right thing. Ugh. Total failure. All the links were like www.thisdomain.com/about.php and they all failed. I experiemented a little and did a whole heap of reading about Apache’s rewrite rules, and came up with this new version of .htaccess for the .com directory.

Options +FollowSymlinks
RewriteEngine on
RewriteCond %{http_host} ^thisdomain.com [NC]
RewriteRule ^(.*)$ http://thisdomain.org/$1 [R=301,NC,L]
RewriteCond %{http_host} ^www.thisdomain.com [NC]
RewriteRule ^(.*)$ http://thisdomain.org/$1 [R=301,NC,L]

I tested it and it worked fine. Now www.thisdomain.com and thisdomain.com all got redirected to thisdomain.org. Nice. I could improve this a little by combining the two RewriteCond conditions by using the [OR] tag between them and have only one RewriteRule, but I’ll get to that later.

Then I thought about it some more. I still had duplicate content. Three quarters of the problem had now gone away, with thisdomain.com and www.thisdomain.com being both redirected to thisdomain.org, but thisdomain.org and www.thisdomain.org still shared content. So I went to the directory where the org website operated from and created another .htaccess file.

RewriteEngine On
RewriteCond %{http_host} ^www.thisdomain.org [NC]
RewriteRule ^(.*)$ http://thisdomain.org/$1 [R=301,NC,L]

More tests, and yes, I finally had it. No more duplicate content. All variations pointed to thisdomain.org.

So I plodded away and did pretty much whatever the validators suggested.

I added good structure with h1 and h2 tags. I added meta lines. I discovered that a tiled image used for the background was 137k. I didn’t realise this when I set it up. So I converted that image from png to jpg and dropped the quality, and it improved the look and the size went down to 16k. I added height and width attributes to the few static images. I added appropriate alt tags to images.

Some things I haven’t got around to doing yet. I probably don’t need to create a sitemap.xml, because the site is very small and everything is clearly linked together through the navigation system. I do need to look at minimising my CSS files, either consolidate the CSS by simplifying it, or by reducing whitespace, or gzipping it. I do need to calculate on the fly the dimensions of images so I can apply the height and width attributes in the img tag.

Anyway, I overcame all the important issues, but still have some areas for improvement.

Now I have several other sites to clean up. And any new sites I build will have all this new stuff built into them right from the start.

Two new Mac OS X tricks

I just learned two new tricks from this article from OSXDaily.com.

1. Want to know where a file is located and you only have the icon? Open the Terminal and drop the icon on it. The full pathname is displayed.

2. “say” can do more than I thought.

I have used “say” often. Sometimes I phone home and want to speak to Anne. She’s on the phone. I ssh home, ssh home to her Mac, then type something like

say "I need to talk to you so please get off the phone"

and her Mac will speak those words to her through the speakers. The first time I did it, Anne freaked out a bit and wanted to exorcise the Mac. Now she understands how it’s done and gets irritable when I do it.

But I’ve just learnt that say can do more than just play words to the speakers. Obviously, I should have done “man say” and learnt a lot more.

You can specify an output file with the -o option. Default is AIFF. You can specify different file formats with –file-format=[AIFF|WAVE|caff|m4af], but they say it’s easier to just specify the extension in the output filename and it will work out what you want.

The default voice is the one in your settings, but you can specify different voices with -v. Go to Settings -> Speech -> Text To Speech, and click on the pulldown menu for the list of system voices. Might have to select Show More Voices, and then click on the pulldown menu again. You can use any of them. I tried out Zarvox, and that was quite enjoyable.

And you can specify an input file with the -f option. Text files are good, even RTF files. I don’t know for sure how many other file formats it knows. But txt is pretty good.

Combining it all, you can do this sort of thing

say -o shouting.wav -v Zarvox -f ~/Documents/myspeech.txt

This is an interesting little program. I like being able to select the different system voices.

←Older