Difference between revisions of "Multi align examples"

From Phonlab
Jump to navigationJump to search
(Created page with "This page illustrates usage of the <code>multi_align</code> command for forced alignment. For the full set of options execute: <code>multi_align --help</code> The examples on …")
 
 
(15 intermediate revisions by the same user not shown)
Line 9: Line 9:
 
The only required argument of <code>multi_align</code> is the name of a <code>.wav</code> file to be aligned. By default the transcript of the audio is expected to be provided by the labels of a textgrid with the same basename as the <code>.wav</code> file and with the extension <code>.Textgrid</code>. The screenshot shows audio and associated textgrid.
 
The only required argument of <code>multi_align</code> is the name of a <code>.wav</code> file to be aligned. By default the transcript of the audio is expected to be provided by the labels of a textgrid with the same basename as the <code>.wav</code> file and with the extension <code>.Textgrid</code>. The screenshot shows audio and associated textgrid.
   
[[File:nws_mono.png]]
+
[[File:nws_mono.png|600px|Annotated audio of 'the north wind and the sun']]
   
If the audio in the screenshot is saved as <code>nws_mono.wav<code> and the textgrid as <code>nws_mono.TextGrid</code>, then the following command performs alignment:
+
If the audio in the screenshot is saved as <code>nws_mono.wav</code> and the textgrid as <code>nws_mono.TextGrid</code>, then the following command performs alignment:
   
 
<code>multi_align nws_mono.wav</code>
 
<code>multi_align nws_mono.wav</code>
Line 17: Line 17:
 
The resulting textgrid contains three tiers, named 'phone', 'word', and 'trs'. The first contains the phone alignments, the second contains the word alignments, and the last contains the original transcript labels.
 
The resulting textgrid contains three tiers, named 'phone', 'word', and 'trs'. The first contains the phone alignments, the second contains the word alignments, and the last contains the original transcript labels.
   
[[File:nws_mono.multi_align.png]]
+
[[File:nws_mono.multi_align.png|600px|Forced alignment of 'the north wind and the sun']]
  +
  +
The output filename uses the same name as the inputs, with the extension <code>.multi_align.TextGrid</code>.
  +
  +
== Specifying a non-default input transcript ==
  +
  +
The input transcript does not have to have the same basename as the <code>.wav</code> file. Use the <code>--input</code> parameter to specify the name of the input transcript. The basename of <code>--input</code> is used to form the output filename:
  +
  +
<code>multi_align --input nws_mono.v2.TextGrid nws_mono.wav # output file is nws_mono.v2.multi_align.TextGrid</code>
  +
  +
== Simple alignment of a single utterance ==
  +
  +
If the audio file contains a single utterance, you might prefer to skip creating a textgrid file and provide the transcript in a simple text file or on the command line. For example, if you have a simple text file named <code>nws_mono.txt</code>:
  +
  +
<code>The north wind and the sun.</code>
  +
  +
Then you can align by specifying that the <code>--input-type</code> is a text file:
  +
  +
<code>multi_align --input-type text nws_mono.wav</code>
  +
  +
By default <code>multi_align</code> looks for a <code>.txt</code> file that matches the <code>.wav</code> name when <code>--input text</code> is used. You can of course override this default with the <code>--input</code> parameter.
  +
  +
A second way to do simple alignment is to include the transcript on the command line. To do this, specify that the <code>--input-type</code> is <code>raw</code>. When <code>raw</code> is used, then the <code>--input</code> parameter should contain the transcript rather than a filename:
  +
  +
<code>multi_align --input-type raw --input 'The north wind and the sun' nws_mono.wav</code>
  +
  +
== Aligning multiple speakers ==
  +
  +
If your transcript contains utterances from multiple speakers, use a separate textgrid tier for each speaker, as in the screenshot:
  +
  +
[[File:nws_mono.2speaker.png|600px|Annotated audio of 'the north wind and the sun' (two speakers)]]
  +
  +
The tiers are named 'spkr1' and 'spkr2'. If this textgrid is named <code>nws_mono.TextGrid</code>, then we align with:
  +
  +
<code>multi_align nws_mono.wav</code>
  +
  +
<code>multi_align</code> aligns each tier separately, and the output contains two sets of 'phone', 'word', and 'trs' tiers, each set prefixed with the original tier name and an underscore, e.g. 'spkr1_phone' (notice that Praat interprets '_p' as a subscript character).
  +
  +
[[File:nws_mono.2speaker.aligned.png|600px|Annotated audio of 'the north wind and the sun' (two speakers)]]
  +
  +
== Aligning multiple speakers in multiple channels ==
  +
  +
If you are fortunate your recording was made with a dedicated microphone focused on each speaker. For example, you might have two speakers in a stereo recording in which one speaker appears in the left channel and one in the right. The screenshot shows such a recording with each speaker annotated on a separate tier.
  +
  +
[[File:nws_stereo.2speaker.png|600px|Annotated audio of 'the north wind and the sun' (two speakers and two channels)]]
  +
  +
Use the <code>--tiers</code> option to provide a comma-separated (no spaces!) list of tier names to align. For this example, also suffix each tier name with <code>:N</code>, where <code>N</code> is the number of the audio channel to align (numbering starts with '1'):
  +
  +
<code>multi_align --tiers spk1:1,spk2:2 nws_stereo.wav # input transcript in nws_stereo.TextGrid</code>
  +
  +
The output file <code>nws_stereo.multi_align.TextGrid</code> contains the result:
  +
  +
[[File:nws_stereo.2speaker.aligned.png|600px|Aligned audio of 'the north wind and the sun' (two speakers and two channels)]]
  +
  +
== Handling errors ==
  +
  +
There are many reasons why alignment of one or more of your textgrids might fail: noisy audio, mistranscription, missing words in the dictionary, unrecognized sequence of phones, etc. When an error occurs the error message returned from <code>pyalign</code> is stored in the 'phone' and 'word' tiers.
  +
  +
As an example, if the rough transcript is poorly synchronized with the audio:
  +
  +
[[File:nws_mono.bad.png|600px|Poorly annotated audio of 'the north wind and the sun']]
  +
  +
then the poorly synchronized portion will not align and an error will appear:
  +
  +
[[File:nws_mono.bad.aligned.png|600px|Poorly aligned audio of 'the north wind and the sun' with error]]
  +
  +
The error message reads:
  +
  +
<code>**ERROR** Command '['pyalign', '-s', '1.1477972199646949', '-e', '1.2788045196338285', '-c', '1', 'nws_mono.wav', 'temp_transcript.txt', 'temp_textgrid.TextGrid']' returned non-zero exit status 1</code>
  +
  +
Alignment of the first part of the transcript succeeded already and does not need to be fixed. The second part does require attention, however, and we can fix this error by providing a better rough alignment of the transcript. To accomplish this, fix the label boundaries and prefix <code>**REALIGN** </code> to sections of the transcript which require realignment:
  +
  +
[[File:nws_mono.realign.png|600px|Audio of 'the north wind and the sun' prepared for realignment]]
  +
  +
Then run <code>multi_align</code> with <code>--input-type realign</code>. The default input textgrid is expected to be the output of a previous run of <code>multi_align</code> and to be suffixed <code>.multi_align.TextGrid</code> (take care when saving a textgrid edited in Praat, which may suggest a filename that replaces '.' with '_').
  +
  +
<code>multi_align --input-type realign nws_mono.wav</code>
  +
  +
Realignment overwrites the existing <code>.multi_align.Textgrid</code> file with one that removes the bad sections of the textgrid and replaces them with the new aligner results. (A backup of the previous textgrid is stored as a hidden file, i.e. as the original filename prefixed by '.' and suffixed by '.N', where 'N' is an integer.)
  +
  +
[[File:nws_mono.realign.multi_aligned.png|600px|Audio of 'the north wind and the sun' after realignment]]
  +
  +
Notice that the parts of the 'trs' tier that were well-aligned already are no longer in the 'trs' tier. The reason for this behavior is to make it easier to find the parts of the transcript that were realigned. The previously-aligned 'phone' and 'word' tiers are combined with the newly-aligned sections to create complete tiers.

Latest revision as of 13:38, 30 November 2018

This page illustrates usage of the multi_align command for forced alignment. For the full set of options execute:

multi_align --help

The examples on this page use audio that contains the utterance 'The north wind and the sun', either as a single channel or a stereo recording in which the first two words are in the first channel and the remaining words are in the second channel.

Default behavior of multi_align

The only required argument of multi_align is the name of a .wav file to be aligned. By default the transcript of the audio is expected to be provided by the labels of a textgrid with the same basename as the .wav file and with the extension .Textgrid. The screenshot shows audio and associated textgrid.

Annotated audio of 'the north wind and the sun'

If the audio in the screenshot is saved as nws_mono.wav and the textgrid as nws_mono.TextGrid, then the following command performs alignment:

multi_align nws_mono.wav

The resulting textgrid contains three tiers, named 'phone', 'word', and 'trs'. The first contains the phone alignments, the second contains the word alignments, and the last contains the original transcript labels.

Forced alignment of 'the north wind and the sun'

The output filename uses the same name as the inputs, with the extension .multi_align.TextGrid.

Specifying a non-default input transcript

The input transcript does not have to have the same basename as the .wav file. Use the --input parameter to specify the name of the input transcript. The basename of --input is used to form the output filename:

multi_align --input nws_mono.v2.TextGrid nws_mono.wav    # output file is nws_mono.v2.multi_align.TextGrid

Simple alignment of a single utterance

If the audio file contains a single utterance, you might prefer to skip creating a textgrid file and provide the transcript in a simple text file or on the command line. For example, if you have a simple text file named nws_mono.txt:

The north wind and the sun.

Then you can align by specifying that the --input-type is a text file:

multi_align --input-type text nws_mono.wav

By default multi_align looks for a .txt file that matches the .wav name when --input text is used. You can of course override this default with the --input parameter.

A second way to do simple alignment is to include the transcript on the command line. To do this, specify that the --input-type is raw. When raw is used, then the --input parameter should contain the transcript rather than a filename:

multi_align --input-type raw --input 'The north wind and the sun' nws_mono.wav

Aligning multiple speakers

If your transcript contains utterances from multiple speakers, use a separate textgrid tier for each speaker, as in the screenshot:

Annotated audio of 'the north wind and the sun' (two speakers)

The tiers are named 'spkr1' and 'spkr2'. If this textgrid is named nws_mono.TextGrid, then we align with:

multi_align nws_mono.wav

multi_align aligns each tier separately, and the output contains two sets of 'phone', 'word', and 'trs' tiers, each set prefixed with the original tier name and an underscore, e.g. 'spkr1_phone' (notice that Praat interprets '_p' as a subscript character).

Annotated audio of 'the north wind and the sun' (two speakers)

Aligning multiple speakers in multiple channels

If you are fortunate your recording was made with a dedicated microphone focused on each speaker. For example, you might have two speakers in a stereo recording in which one speaker appears in the left channel and one in the right. The screenshot shows such a recording with each speaker annotated on a separate tier.

Annotated audio of 'the north wind and the sun' (two speakers and two channels)

Use the --tiers option to provide a comma-separated (no spaces!) list of tier names to align. For this example, also suffix each tier name with :N, where N is the number of the audio channel to align (numbering starts with '1'):

multi_align --tiers spk1:1,spk2:2 nws_stereo.wav  # input transcript in nws_stereo.TextGrid

The output file nws_stereo.multi_align.TextGrid contains the result:

Aligned audio of 'the north wind and the sun' (two speakers and two channels)

Handling errors

There are many reasons why alignment of one or more of your textgrids might fail: noisy audio, mistranscription, missing words in the dictionary, unrecognized sequence of phones, etc. When an error occurs the error message returned from pyalign is stored in the 'phone' and 'word' tiers.

As an example, if the rough transcript is poorly synchronized with the audio:

Poorly annotated audio of 'the north wind and the sun'

then the poorly synchronized portion will not align and an error will appear:

Poorly aligned audio of 'the north wind and the sun' with error

The error message reads:

**ERROR** Command '['pyalign', '-s', '1.1477972199646949', '-e', '1.2788045196338285', '-c', '1', 'nws_mono.wav', 'temp_transcript.txt', 'temp_textgrid.TextGrid']' returned non-zero exit status 1

Alignment of the first part of the transcript succeeded already and does not need to be fixed. The second part does require attention, however, and we can fix this error by providing a better rough alignment of the transcript. To accomplish this, fix the label boundaries and prefix **REALIGN** to sections of the transcript which require realignment:

Audio of 'the north wind and the sun' prepared for realignment

Then run multi_align with --input-type realign. The default input textgrid is expected to be the output of a previous run of multi_align and to be suffixed .multi_align.TextGrid (take care when saving a textgrid edited in Praat, which may suggest a filename that replaces '.' with '_').

multi_align --input-type realign nws_mono.wav

Realignment overwrites the existing .multi_align.Textgrid file with one that removes the bad sections of the textgrid and replaces them with the new aligner results. (A backup of the previous textgrid is stored as a hidden file, i.e. as the original filename prefixed by '.' and suffixed by '.N', where 'N' is an integer.)

Audio of 'the north wind and the sun' after realignment

Notice that the parts of the 'trs' tier that were well-aligned already are no longer in the 'trs' tier. The reason for this behavior is to make it easier to find the parts of the transcript that were realigned. The previously-aligned 'phone' and 'word' tiers are combined with the newly-aligned sections to create complete tiers.