Difference between revisions of "Forced alignment"

From Phonlab
Jump to navigationJump to search
Line 31: Line 31:
 
The aligner uses the CMU pronouncing dictionary, which of course does not cover every word that might be uttered in your recording. The aligner will supplemental the CMU dictionary with the contents of a file named <code>dict.local</code> in your current working directory, if it exists. You can create a file with this name and add as many records as you like.
 
The aligner uses the CMU pronouncing dictionary, which of course does not cover every word that might be uttered in your recording. The aligner will supplemental the CMU dictionary with the contents of a file named <code>dict.local</code> in your current working directory, if it exists. You can create a file with this name and add as many records as you like.
   
You can use a spreadsheet in google drive to maintain a <code>dict.local</code> and pull it in script. This can be especially convenient if you are collaborating with others, as you can collectively maintain a spreadsheet. Here is an example of how to do it, based on Ling113 in spring 2015, using the BCE:
+
You can use a spreadsheet in google drive to maintain a <code>dict.local</code> and pull it in with a script. This can be especially convenient if you are collaborating with others, as you can collectively maintain a supplemental dictionary. Here is an example of how to do it, based on Ling113 in spring 2015, using the BCE:
   
  +
=== Set up the spreadsheet ===
# Create a google spreadsheet (see the [https://docs.google.com/spreadsheets/d/1WwGgZxk5RoU0TAOoJlKPUsoEgZEYjEgucD7zrK3n6Xo/edit?usp=sharing Ling113 example]) and share it with everyone in your group as an editor.
 
  +
  +
# Create a google spreadsheet and share it with everyone in your group as an editor.
 
# Also add share rights so that anyone with the link can view the spreadsheet. If you prefer, make the spreadsheet public on the web.
 
# Also add share rights so that anyone with the link can view the spreadsheet. If you prefer, make the spreadsheet public on the web.
  +
# Add records to the spreadsheet by putting the transcription of a word in the first column and the pronunciation in the second.
# Open the spreadsheet and copy the URL from your browser's location bar. The Ling113 example looks like this: <code>https://docs.google.com/a/berkeley.edu/spreadsheets/d/1WwGgZxk5RoU0TAOoJlKPUsoEgZEYjEgucD7zrK3n6Xo/edit#gid=0</code>. Notice the long alphanumeric string after <code>/d/</code>. This is the file key. Also notice the <code>gid</code> value.
 
 
See the [https://docs.google.com/spreadsheets/d/1WwGgZxk5RoU0TAOoJlKPUsoEgZEYjEgucD7zrK3n6Xo/edit?usp=sharing Ling113 example].
# https://github.com/rsprouse/ucblingmisc/blob/master/bash/ling113_get_local_dict
 
 
# Open the spreadsheet and copy the URL from your browser's location bar. The Ling113 example looks like this: <code>https://docs.google.com/a/berkeley.edu/spreadsheets/d/1WwGgZxk5RoU0TAOoJlKPUsoEgZEYjEgucD7zrK3n6Xo/edit#gid=0</code>.
  +
# Notice the long alphanumeric string after <code>/d/</code> in your URL. This is the file key.
  +
# Also notice the <code>gid</code> value in your url. This will probably be '0', but if you have added multiple sheets it might be different. Make sure your current view is the sheet with the records you want to export.
  +
  +
=== Create a download script ===
  +
  +
# Choose a name for your script. In our example here we'll call it <code>get_dict_local</code>. In some cases it might be sensible to make it specific to a project, e.g. <code>get_dict_local_myproject</code>.
  +
# Create and edit a script file in your path. This works in BCE: <code>sudo gedit /usr/local/bin/get_dict_local</code>. Use the script name you chose in the first step.
 
# Use the [https://github.com/rsprouse/ucblingmisc/blob/master/bash/ling113_get_local_dict Ling113 example script] as a base for your download script. Just copy and paste into your editor.
  +
# Delete the value of the <code>FILEKEY</code> variable in the Ling113 script (the part between quotation marks) and replace it with the file key you found in your spreadsheet's URL.
  +
# Delete the value of the <code>GID</code> variable and replace it with your gid value.
  +
# Save the changes you made to the script and exit the editor.
  +
# Make sure your script is executable. This works in BCE: <code>sudo chmod +x /usr/local/bin/get_dict_local</code>. Make sure you use the script name you chose if it is different than <code>get_dict_local</code>.
  +
  +
=== Using the script ===
  +
  +
Using the script is easy. You simply call your script by name at the command line, e.g. <code>get_dict_local</code> and the <code>dict.local</code> file will be created or updated in your current working directory from the contents of your google spreadsheet.

Revision as of 12:12, 10 February 2015

The aligner is an implementation of the Penn forced aligner (Jiahong Yuan), which is based on the HTK speech recognition toolkit. It produces a Praat textgrid file that has word and phone boundaries for the speech in a wav file that you give to the aligner. A BIG time saver. We used this system in the "voices of Berkeley" project to find vowel midpoints and take formant measurements automatically.

It runs on the Dept of Linguistics server using sox and the HTK library of automatic speech recognition software. You may be able to set this up on your home computer, but most people will find it easier to use the server.

How to use the aligner

  1. Your .wav file. The aligner uses sox to create a copy of your wav file that has all of the properties that are needed for HTK. One thing to keep in mind is that if you specify that you want the 16kHz acoustic models to be used, but you pass an 11.025 kHz file to the aligner the performance will be degraded. Just be sure that the sampling rate of your wav file is at least as fast as the acoustic models you specify.
  2. Your transcript file. The aligner needs to know what words are spoken in the .wav file, and needs to know the order in which they are spoken (and may also need to know about disfluencies, laughter, etc. if they are there). The transcript file is a text document that contains a transcript of the words spoken in the wav file.
  3. Words you can use in the transcript file. The aligner, by default, uses the pronouncing dictionary that you can see at /opt/f2fa/model/dict. It will copy your transcript to allcaps before looking up words in the dictionary. If you need a project-specific dictionary (which might include, for example, a set of nonwords, or a set of words in a language other than English) you can create a file that you name "dict.local" that has the same format as /opt/f2a/model/dict but includes your project-specific vocabulary. pyalign looks at both the default dictionary and dict.local to find transcriptions of the words in your transcript file.
  4. The unix command (the Penn tool is named align.py; pyalign is just a simple wrapper that makes align.py easier to call in the context of our server):

Command-line usage:

> pyalign [options] wave_file transcript_file output_file

where options may include:

 -r sampling_rate -- override which sample rate model to use, one of 8000, 11025, and 16000
 -s start_time    -- start of portion of wavfile to align (in seconds, default 0)
 -e end_time      -- end of portion of wavfile to align (in seconds, defaul to end)


The -r option determines which set of acoustic models to use (I would recommend that you use 16000). Your sound file should have a sampling rate that is equal to or greater than the acoustic model sampling rate.

The output file is a text file that can be read into Praat as a textgrid and then you can use Praat scripting to extract phonetic measurements, or you can read the textgrid in a python script (see meas_formants for an example) and use the ESPS unix command-line acoustic analysis package to extract phonetic measurements.

Sharing a dict.local with a google drive spreadsheet

The aligner uses the CMU pronouncing dictionary, which of course does not cover every word that might be uttered in your recording. The aligner will supplemental the CMU dictionary with the contents of a file named dict.local in your current working directory, if it exists. You can create a file with this name and add as many records as you like.

You can use a spreadsheet in google drive to maintain a dict.local and pull it in with a script. This can be especially convenient if you are collaborating with others, as you can collectively maintain a supplemental dictionary. Here is an example of how to do it, based on Ling113 in spring 2015, using the BCE:

Set up the spreadsheet

  1. Create a google spreadsheet and share it with everyone in your group as an editor.
  2. Also add share rights so that anyone with the link can view the spreadsheet. If you prefer, make the spreadsheet public on the web.
  3. Add records to the spreadsheet by putting the transcription of a word in the first column and the pronunciation in the second.

See the Ling113 example.

  1. Open the spreadsheet and copy the URL from your browser's location bar. The Ling113 example looks like this: https://docs.google.com/a/berkeley.edu/spreadsheets/d/1WwGgZxk5RoU0TAOoJlKPUsoEgZEYjEgucD7zrK3n6Xo/edit#gid=0.
  2. Notice the long alphanumeric string after /d/ in your URL. This is the file key.
  3. Also notice the gid value in your url. This will probably be '0', but if you have added multiple sheets it might be different. Make sure your current view is the sheet with the records you want to export.

Create a download script

  1. Choose a name for your script. In our example here we'll call it get_dict_local. In some cases it might be sensible to make it specific to a project, e.g. get_dict_local_myproject.
  2. Create and edit a script file in your path. This works in BCE: sudo gedit /usr/local/bin/get_dict_local. Use the script name you chose in the first step.
  3. Use the Ling113 example script as a base for your download script. Just copy and paste into your editor.
  4. Delete the value of the FILEKEY variable in the Ling113 script (the part between quotation marks) and replace it with the file key you found in your spreadsheet's URL.
  5. Delete the value of the GID variable and replace it with your gid value.
  6. Save the changes you made to the script and exit the editor.
  7. Make sure your script is executable. This works in BCE: sudo chmod +x /usr/local/bin/get_dict_local. Make sure you use the script name you chose if it is different than get_dict_local.

Using the script

Using the script is easy. You simply call your script by name at the command line, e.g. get_dict_local and the dict.local file will be created or updated in your current working directory from the contents of your google spreadsheet.