Extracting Subtitles from DVDs
Sometimes it may be useful to extract the subtitles from a DVD in a text format that can then be used for other pedagogical uses, such as providing realistic dialogues as reading exercises in language classes. This tutorial will walk you through the process of extracting subtitles from a DVD and saving them as a text file. The software mentioned in the tutorial is available for use on the Macintosh computers in the LRC in Williams hall. However, if you own a Macintosh computer you can download and use this software yourself from the following locations:
- YadeX: Used for decrypting the DVD. Freeware - http://www.macetvideo.com/yadex10/yadex.html
- D-Subtitler: Extracts the subtitles. Freeware - http://www.objectifmac.com/dsubtitler.php
As implied by the two programs required, extracting subtitles from DVD's is a two step process. The first step is to save the DVD onto your computer and decrypt it. The second is then to extract the subtitles from the decrypted file.
Step 1: Decrypting the DVD
There are a lot of programs that are easier to navigate and use when it comes to decrypting DVD's than YadeX. One of the major problems with using this piece of software is that it is only available in French. However, it does offer one advantage that over most other DVD decryptors, and that is the ability to save the entire DVD to a single file, as opposed to spitting out a folder containing hundreds of files. D-Subtitler has real problems working with folders, so YadeX's single file option seems to be the only way to go. Here is a step by step guide to decrypt a DVD into a format that will be suitable for D-Subtitler.
a) Insert your DVD
This may seem a simple task, but for people who have never used a Mac, they may be flummoxed by the fact that there is no Open or Close button located directly on the DVD drive on the computer box. Instead, use the Eject button on the upper right hand corner of the keyboard:
b) Open "YadeX NoCSS.app"
It will be located in the Applications folder on the Mac and has an icon that looks as follows:
When it first opens, the main window will look like the following, listing all of the titles available on your DVD:
It can be difficult to tell which title of the DVD you want to use given that there are no names listed here, just the generic terms "Menu" and "Piste." However, on the right of each item in the list, you will see its length. Here we can see that "Piste 2" is 1 hour and 47 minutes long. It is the longest item by far on the disc, and hence it is safe to assume that this is the title we want.
c) Select the parts of the DVD you want
Generally you will want to extract the entire movie in which case, select the entire movie by clicking its title, in this case "Piste 2." However, it may be the case that you only want to extract one or two chapters out of the DVD instead of the entire movie. If this is the case, then click the small triangle to the left of the title to expand it and show the chapters it contains. Then, select only the chapters you want:
d) Decrypt Your Selection
One you are satisfied that you have the correct sections of the DVD selected it is time to decrypt it. Firstly, make sure that the computer you are working on has sufficient free space. DVD's store a lot of data and if you are extracting an entire movie, it can take up to 8.5 GB. Thus, to be on the safe side, make sure the computer you are working on has at least 10 GB of space available on the hard drive. If you have enough space, in the menu at the top of the screen click "Fichier:Enregistrer le VOB" as shown below:
You will then be asked where to save the file. I am going to save it to the desktop and name it the title of the movie, in my case, El Mar:
Finally, click "Enregistrer" to start decrypting the file. A status bar will appear showing you how far along the process is. Depending on the speed of your computer and the length of the movie you are extracting, this process can take anywhere from 5 minutes to an hour:
After the process is finished, you should have a single .VOB file sitting on your computer. In my case, it's on my desktop:
2. Extracting the Subtitles:
a) Open D-Subtitler.app
Now that we have the .VOB file open D-Subtitler.app. It is in the Applications folder on the Mac and it's icon looks as such:
It's main window looks as follows:
b) Open the .VOB File
With D-Subtitler open, open the .VOB file we just created by clicking "File:Open..." from the menu at the top of the screen:
This will bring up the File Open window. Select your .VOB file. In my case it is ElMar.vob saved on the desktop:
The file will open in D-Subtitler and it will display a small preview image of the beginning of the movie so you can be sure that you have selected the correct file.
c) Select the Desired Language
Unfortunately, D-Subtitler doesn't name the subtitles that are available. No matter how many subtitle languages are available in the .VOB file, it will still display 20 options, (0-19) in the Languages drop down. Further, these will always be labeled )-19, and never take the name of the language they represent. By default, English is usually the first set of subtitles (0). To find the others, as a general rule of thumb, whatever order languages are listed in the DVD menu when you insert the DVD into a DVD player, is the order they will be in the list of available subtitles in D-Subtitler. I am going to select English, and so will select language 0:
d) Extract the Subtitles
Clicking on the big green button will begin the extraction process:
Whilst the application is working, which usually is a lot faster than the decryption process we worked through earlier, the application will show you that it is busy:
e) D-Subtitler Asks for Your help
About half way through the extraction process, D-Subtitler will ask for your help to make sure that it is doing the correct job. Firstly, it will provide a blank window and ask if you can see text within it. If you can, as in the window below, click continue. Otherwise, adjust the settings next to "Choice of Gray Level" and click Test until you can see the text. Once you can, click Continue:
D-Subtitler then goes through a very complex process of converting the pictures of the words on the DVD into editable text. This is known as Optical Character Recognition (OCR). During this time, it may once again want to make sure it is doing the correct job of identifying words. If it is unsure it will pop-up the following window:
On the left is an image of the character it has found. On the right in the box you can enter the character that it should recognize this as, in this case a colon. Finally, click Validate. However, as in the case above, it may be very difficult to know exactly what character this should be. For example, how did I tell the difference between the offered character being a period or a colon? To see the character in the context of the full sentence to make these calls, click the "preview..." button. D-Subtitler will then show you the exact image in context from the DVD:
From the image above it becomes clear that it had found a colon. Close the preview window, enter the character that this should be in the box, and click Validate.
Depending on the quality of the subtitles in your DVD, D-Subtitler may ask you many times for help, of none at all.
e) Save .SRT File
Once D-Subtitler has finished converting the images to text, it will ask you where to save the text file. This file is given a .SRT extension by default, as this is what media player applications expect, but ultimately it is simply a text file, so you could just as easily give it the .TXT extension:
f) Review the File
Finally, D-Subtitler will provide you with a preview of the file allowing you to edit it. This is because, as noted, the program converts the text from images, and may have made some mistakes. Generally it is very accurate and so you shouldn't need to worry. A quick way of checking is simply to run a spell check on the file by clicking the spell check button:
When you are satisfied, save the file and close it.
3. (Optionally) Remove Time stamps
As you will notice from the file above, it contains time stamps telling the computer at exactly what part of the movie to display the given subtitle. This is not very user-friendly if you want to distribute the file as an example dialogue. To remove the time stamps we will use a free text-editor application called Smultron available from: http://smultron.sourceforge.net/ This program is already installed on the Macs in the LRC, but you can download it to your computer at home if need be, however, it is Macintosh only.
The way we are going to remove the time stamps is using a system know as Regular Expressions. These are a very powerful way of searching through text files to find pieces of text that match a certain pattern, as opposed to searching for a specific piece of text.
a) Open Smultron.app and the .SRT File
As with all the other applications, it's in the Applications folder. Its icon looks as follows:
Next, click "File:Open" from the menu at the top of the screen and open the file that you created in D-Subtitler. In my case it is called file.srt.
b) Advanced Find and Replace
With the file open in Smultron, click "Edit:Advanced Find and Replace..."
The following window will appear:
In the box next to "Find:" enter the following text exactly (it may be easiest to cut and paste it):
So that the fields look like this (make sure that the "Replace Box" is blank):
See the above link to an explanation on regular expressions if you are curios as to what this regular expression means.
Make sure that "Use Regular Expressions" is selected:
Finally, click the "Replace" button. Smultron may give you a warning seeing as you are about to delete a large part of the file:
Click "Delete" Don't worry about this because, as the warning says, you can undo any changes that are made to the file.
Smultron will sit and think for a second, and will then delete all of the time stamps out of the file apart from the first one leaving your text looking something like this:
Select the first timestamp that Smultron missed and delete it manually with the keyboard. Then, save the file and close Smultron.