Extracting Subtitles from DVDs

Sometimes it may be useful to extract the subtitles from a DVD in a text format that can then be used for other pedagogical uses, such as providing realistic dialogues as reading exercises in language classes. This tutorial will walk you through the process of extracting subtitles from a DVD and saving them as a text file. The software mentioned in the tutorial is available for use on the Macintosh computers in the LRC in Williams hall. However, if you own a Macintosh computer you can download and use this software yourself from the following locations:

As implied by the two programs required, extracting subtitles from DVD's is a two step process. The first step is to save the DVD onto your computer and decrypt it. The second is then to extract the subtitles from the decrypted file.

Step 1: Decrypting the DVD

There are a lot of programs that are easier to navigate and use when it comes to decrypting DVD's than YadeX. One of the major problems with using this piece of software is that it is only available in French. However, it does offer one advantage that over most other DVD decryptors, and that is the ability to save the entire DVD to a single file, as opposed to spitting out a folder containing hundreds of files. D-Subtitler has real problems working with folders, so YadeX's single file option seems to be the only way to go. Here is a step by step guide to decrypt a DVD into a format that will be suitable for D-Subtitler.

a) Insert your DVD

This may seem a simple task, but for people who have never used a Mac, they may be flummoxed by the fact that there is no Open or Close button located directly on the DVD drive on the computer box. Instead, use the Eject button on the upper right hand corner of the keyboard:

Picture of eject key

b) Open "YadeX NoCSS.app"

It will be located in the Applications folder on the Mac and has an icon that looks as follows:

Picture of Yade X icon

When it first opens, the main window will look like the following, listing all of the titles available on your DVD:

Picture of Yade X main window

It can be difficult to tell which title of the DVD you want to use given that there are no names listed here, just the generic terms "Menu" and "Piste." However, on the right of each item in the list, you will see its length. Here we can see that "Piste 2" is 1 hour and 47 minutes long. It is the longest item by far on the disc, and hence it is safe to assume that this is the title we want.

c) Select the parts of the DVD you want

Generally you will want to extract the entire movie in which case, select the entire movie by clicking its title, in this case "Piste 2." However, it may be the case that you only want to extract one or two chapters out of the DVD instead of the entire movie. If this is the case, then click the small triangle to the left of the title to expand it and show the chapters it contains. Then, select only the chapters you want:

Picture of select title dialog

d) Decrypt Your Selection

One you are satisfied that you have the correct sections of the DVD selected it is time to decrypt it. Firstly, make sure that the computer you are working on has sufficient free space. DVD's store a lot of data and if you are extracting an entire movie, it can take up to 8.5 GB. Thus, to be on the safe side, make sure the computer you are working on has at least 10 GB of space available on the hard drive. If you have enough space, in the menu at the top of the screen click "Fichier:Enregistrer le VOB" as shown below:

Picture of file registration dialog

You will then be asked where to save the file. I am going to save it to the desktop and name it the title of the movie, in my case, El Mar:

Picture of Yade X save window

Finally, click "Enregistrer" to start decrypting the file. A status bar will appear showing you how far along the process is. Depending on the speed of your computer and the length of the movie you are extracting, this process can take anywhere from 5 minutes to an hour:

Picture of decrypting dialog

After the process is finished, you should have a single .VOB file sitting on your computer. In my case, it's on my desktop:

Picture of .vob icon

2. Extracting the Subtitles:

a) Open D-Subtitler.app

Now that we have the .VOB file open D-Subtitler.app. It is in the Applications folder on the Mac and it's icon looks as such:

Picture of D-Subtitler icon

It's main window looks as follows:

Picture of D-Subtitler main window

b) Open the .VOB File

With D-Subtitler open, open the .VOB file we just created by clicking "File:Open..." from the menu at the top of the screen:

Picture of D-Subtitler file open dialog

This will bring up the File Open window. Select your .VOB file. In my case it is ElMar.vob saved on the desktop:

Picture of D-Subtitler open window

The file will open in D-Subtitler and it will display a small preview image of the beginning of the movie so you can be sure that you have selected the correct file.

c) Select the Desired Language

Unfortunately, D-Subtitler doesn't name the subtitles that are available. No matter how many subtitle languages are available in the .VOB file, it will still display 20 options, (0-19) in the Languages drop down. Further, these will always be labeled )-19, and never take the name of the language they represent. By default, English is usually the first set of subtitles (0). To find the others, as a general rule of thumb, whatever order languages are listed in the DVD menu when you insert the DVD into a DVD player, is the order they will be in the list of available subtitles in D-Subtitler. I am going to select English, and so will select language 0:

Picture of D-Subtitler choose language dialog

d) Extract the Subtitles

Clicking on the big green button will begin the extraction process:

Picture of D-Subtitler go button

Whilst the application is working, which usually is a lot faster than the decryption process we worked through earlier, the application will show you that it is busy:

Picture of D-Subtitler working dialog

e) D-Subtitler Asks for Your help

About half way through the extraction process, D-Subtitler will ask for your help to make sure that it is doing the correct job. Firstly, it will provide a blank window and ask if you can see text within it. If you can, as in the window below, click continue. Otherwise, adjust the settings next to "Choice of Gray Level" and click Test until you can see the text. Once you can, click Continue:

Picture of D-Subtitler test view window

D-Subtitler then goes through a very complex process of converting the pictures of the words on the DVD into editable text. This is known as Optical Character Recognition (OCR). During this time, it may once again want to make sure it is doing the correct job of identifying words. If it is unsure it will pop-up the following window:

Picture of D-Subtitler choose character dialog

On the left is an image of the character it has found. On the right in the box you can enter the character that it should recognize this as, in this case a colon. Finally, click Validate. However, as in the case above, it may be very difficult to know exactly what character this should be. For example, how did I tell the difference between the offered character being a period or a colon? To see the character in the context of the full sentence to make these calls, click the "preview..." button. D-Subtitler will then show you the exact image in context from the DVD:

Picture of D-Subtitler preview character window

From the image above it becomes clear that it had found a colon. Close the preview window, enter the character that this should be in the box, and click Validate.

Depending on the quality of the subtitles in your DVD, D-Subtitler may ask you many times for help, of none at all.

e) Save .SRT File

Once D-Subtitler has finished converting the images to text, it will ask you where to save the text file. This file is given a .SRT extension by default, as this is what media player applications expect, but ultimately it is simply a text file, so you could just as easily give it the .TXT extension:

Picture of D-Subtitler save dialog

f) Review the File

Finally, D-Subtitler will provide you with a preview of the file allowing you to edit it. This is because, as noted, the program converts the text from images, and may have made some mistakes. Generally it is very accurate and so you shouldn't need to worry. A quick way of checking is simply to run a spell check on the file by clicking the spell check button:

Picture of D-Subtitler .srt edit window

When you are satisfied, save the file and close it.

3. (Optionally) Remove Time stamps

As you will notice from the file above, it contains time stamps telling the computer at exactly what part of the movie to display the given subtitle. This is not very user-friendly if you want to distribute the file as an example dialogue. To remove the time stamps we will use a free text-editor application called Smultron available from: http://smultron.sourceforge.net/ This program is already installed on the Macs in the LRC, but you can download it to your computer at home if need be, however, it is Macintosh only.

The way we are going to remove the time stamps is using a system know as Regular Expressions. These are a very powerful way of searching through text files to find pieces of text that match a certain pattern, as opposed to searching for a specific piece of text.

a) Open Smultron.app and the .SRT File

As with all the other applications, it's in the Applications folder. Its icon looks as follows:

Picture of Smultron icon

Next, click "File:Open" from the menu at the top of the screen and open the file that you created in D-Subtitler. In my case it is called file.srt.

b) Advanced Find and Replace

With the file open in Smultron, click "Edit:Advanced Find and Replace..."

Picture of Smultron "Advanced Find and Replace" menu selection

The following window will appear:

Picture of Smultron "Advanced Find and Replace" window

In the box next to "Find:" enter the following text exactly (it may be easiest to cut and paste it):

\n.*\n..:..:..,....-->...:..:..,...

So that the fields look like this (make sure that the "Replace Box" is blank):

Picture of Smultron add regular expression dialog

See the above link to an explanation on regular expressions if you are curios as to what this regular expression means.

Make sure that "Use Regular Expressions" is selected:

Picture of Smultron use regular expressions checkbox

Finally, click the "Replace" button. Smultron may give you a warning seeing as you are about to delete a large part of the file:

Picture of Smultron warning dialog

Click "Delete" Don't worry about this because, as the warning says, you can undo any changes that are made to the file.

Smultron will sit and think for a second, and will then delete all of the time stamps out of the file apart from the first one leaving your text looking something like this:

Picture of final text

Select the first timestamp that Smultron missed and delete it manually with the keyboard. Then, save the file and close Smultron.