Difference between revisions of "Sox in phonetic research"

From Phonlab
Jump to navigationJump to search
(Created page with "# Using `sox` in phonetic research `sox` is a great command line tool for working with audio files, and this page illustrates some of the ways phoneticians can make use of it in…")
 
Line 50: Line 50:
   
 
> sox -r 22050 input.wav output.wav
 
> sox -r 22050 input.wav output.wav
  +
  +
In this example the signal in `output.wav` contains the same number of samples as `input.wav` (i.e. it is *not* resampled), and the value of the sample rate property in the header will be 22050Hz, regardless of the sample rate value in `input.wav`'s header.
  +
  +
### Average two stereo channels into a single mono channel
  +
  +
> sox input.wav -c 1 output.wav avg
  +
  +
Here `input.wav` is a two-channel audio file. The `-c 1` output format selects a single-channel output, and the `avg` effect averages the two input channels.
  +
  +
### Remove a channel from a stereo recording to make a mono audio file
  +
  +
Some digital recorders always make stereo recordings, regardless of whether you use two microphones or not, resulting in a recording that either (1) has one channel with no audio other than amplifier noise; or (2) has two identical channels. In either case it is efficient to remove the unneeded channel from the recording--since your new file contains only half the data (not counting the file header), processing will be easier and faster, and you'll only need about half the storage to save the file.
  +
  +
To remove a channel you tell `avg` to copy only one of the channels:
  +
  +
> sox input.wav -c 1 output.wav avg -l
  +
  +
This copies only the left channel to the output file and drops the right one. Use `avg -r` to do the opposite.
  +
  +
## Synthesis
  +
  +
`sox` has a `synth` effect for generating tones or wideband noise that can be used in standalone form, or concatenated or mixed with an existing audio file.
  +
  +
### Generate a periodic waveform
  +
  +
This generates a 2.25 second sine wave at 300Hz (sample rate 12050; 16-bit signed integer):
  +
  +
> sox -n -r 12050 -b 16 -s output.wav synth 2.25 sine 300
  +
  +
### Generate the sum of two periodic waveforms
  +
  +
This chains two synth effects to add a a 250Hz square wave to the sine wave:
  +
  +
> sox -n -r 12050 -b 16 -s output.wav synth 2.25 sine 300 synth 2.25 square 250
  +
  +
See the [`sox` documentation](http://sox.sourceforge.net/Docs/Documentation) for additional options on creating swept rather than fixed frequency tones, and specifying phase, dc offset, and more.
  +
  +
### Generate an aperiodic waveform (noise)
  +
  +
This produces a 3.5-second file with a 44.1KHz sample rate at 16 bits/sample. The '-n' parameter tells sox that no input file is used to determine the duration or data format:
  +
  +
> sox -n -r 44100 -b 16 noise.wav synth 3.5 whitenoise
  +
  +
You can also use noise with different spectral characteristics by using 'brownnoise' or 'pinknoise' instead of 'whitenoise'.
  +
  +
### Generate an aperiodic waveform to match an audio file
  +
  +
You will often want to produce a noise file that matches the recording parameters of some other file, and for that you specify an input file:
  +
  +
> sox input.wav noise.wav synth whitenoise
  +
  +
This produces an output file that has the same number of samples (duration) and sample rate as the input file.
  +
  +
### Add noise to an audio file
  +
  +
You can combine two sox commands to add noise to an input file in a new output file. These commands (1) generate noise matching the duration and sample rate of the input file and then (2) mix the generated noise with the input file to produce the new output:
  +
  +
> sox input.wav -p synth whitenoise vol 0.02 | sox -m input.wav - addednoise.wav
  +
  +
In the above example the '-p' parameter replaces the output file name in the first command and tells sox to pipe the noise output to the next sox command instead of to a file. This piped data stream shows up in the second command as the '-' argument, where the '-m' option tells sox to mix it with input.wav to produce the output file addednoise.wav.
  +
  +
It's important to note that the above example employs the simplest kind of mixing in which values of the two input files are simply added together, which introduces the risk of clipping in your output file. This clipping risk might not be a problem if you know that the amplitude range of your input file leaves enough room for the noise that you want to add. For instance, if the amplitude values in your input file have a maximum/minimum of +-0.6 you can add noise with a range of +-0.4 with no danger of clipping. If you want to add more noise than that you will need to apply the 'vol' gain effect or the 'norm' effect to prevent clipping.
  +
  +
An alternative syntax for the previous command is to use the shell's process substitution construct `<()` to create the noise as the second input file to `sox -m`:
  +
  +
> sox -m input.wav <(sox input.wav -p synth whitenoise vol 0.02) addednoise.wav
  +
  +
## Adjusting the volume
  +
  +
### The `norm` effect
  +
  +
### The `vol` effect
  +
  +
The `vol` effect amplifies or attenuates the audio signal.
  +
  +
By default `sox` applies a linear scale to the signal. `vol` values greater than 1 result in amplification, and values between 0 and 1 result in attenuation. This example doubles the sample values of a signal:
  +
  +
> sox input.wav output.wav vol 2
  +
  +
One way to maximize the amplitude of a signal without clipping is to use the `stat` effect to retrieve a multiplier that will result in a maximized signal and pass it to the `vol` effect:
  +
  +
> sox input.wav output.wav vol $(sox input.wav -n stat -v 2>&1)
  +
  +
In the above example two calls to `sox` are necessary. The second call is performed via the shell's command substitution syntax `$()` and retrieves the multiplier (which `sox` outputs to `STDERR`, which we redirect to `STDOUT` with the `2>&1` syntax), and this multiplier is used as the literal value of the `vol` effect in the first command.
  +
  +
Note that the above complex command achieves the same result as the simple `norm` effect:
  +
  +
> sox input.wav output.wav norm
  +
  +
The complex command may be faster than `norm` since it does not create a temporary file during processing, which `norm` does. For long audio files this advantage could be considerable. The complex command might not work for some types of encodings (PCM should be fine), and the `2>&1` syntax is not available in all shells.
  +
  +
### Scale the amplitude of an audio file so that the max/min values fit within a specific portion of the possible range
  +
  +
Here's a more complicated example. Say we want to scale our audio signal so that sample values are in the range +/-0.8 rather than the allowed maximum range +/-1.0, perhaps because we want to add noise with an amplitude range of +/-0.2. To do this we can use the `stat` effect to grab the `vol` multiplier and multiply it by 0.8:
  +
  +
> sox input.wav output.wav vol $(echo "$(sox input.wav -n stat -v 2>&1) * 0.8" | bc -l)
  +
  +
This example makes use of nested command substitution to first retrieve, then scale, the multiplier, which is subsequently used as the literal value of the `vol` effect. The `bc` calculator must be available on your system in order to perform the multiplication.
  +
  +
### Add noise to an audio file without clipping
  +
  +
Putting things all together, here's one way to add noise to an audio file and be sure to avoid clipping. Note that the command is broken into multiple lines for readability. The `\` character tells the shell that command continues on the following line.
  +
  +
> sf1=0.94; sf2=0.05; \
  +
sox -m \
  +
-v $(echo "$(sox input.wav -n stat -v 2>&1) * ${sf1}" | bc -l) input.wav \
  +
-v ${sf2} <(sox input.wav -p synth whitenoise) \
  +
-b 16 output.wav
  +
  +
Let's break this down. This complex command mixes two input files produced by subcommands. The basic form of a mixing command is:
  +
  +
> sox -m [-v factor1] input1.wav [-v factor2] input2.wav output.wav
  +
  +
Here `input1.wav` and `input2.wav` are mixed to produce `output.wav`. If provided, the `-v factorN` arguments linearly scale the corresponding input files.
  +
  +
In our complex command the first line
  +
  +
> sf1=0.94; sf2=0.05;
  +
  +
sets the value of two shell variables, the scaling factors `sf1` and `sf2`. These are used to scale the amplitudes of the two input files.
  +
  +
The second line
  +
  +
> sox -m
  +
  +
tells `sox` to look for multiple input files to mix into an output file.
  +
  +
The third line identifies the first input file, which will be an amplitude-scaled version of our real input file
  +
  +
> -v $(echo "$(sox input.wav -n stat -v 2>&1) * ${sf1}" | bc -l) input.wav
  +
  +
As we saw earlier, the `$(echo "$(sox input.wav -n stat -v 2>&1) * ${sf1}" | bc -l)` construct creates a gain multiplier that scales a signal to a percentage (`${sf1}`; 0.94) of the maximum possible amplitude. This multiplier is used as the literal value of the `-v` argument, scaling `input.wav` so that it uses 94% of the allowed amplitude range.
  +
  +
The next line creates and scales a noise input based on the duration and sample rate of `input.wav`:
  +
  +
> -v ${sf2} <(sox input.wav -p synth whitenoise)
  +
  +
Here the shell's process substitution construct `<()` is used to chain the output of a nested command to the input of the enclosing `sox -m` command. The special output filename `-p` is used in place of a real output filename so that the output data can be used as input to the enclosing `sox -m` command. Since `synth` generates full-amplitude noise by default, we use `-v ${sf2}` to scale the noise to 5% (0.05) of the maximum. (Note the subtle difference between process substitution, used in this line, and command substitution `$()`, used in the preceding line. Command substitution produces a string to be used as a literal value in the enclosing command. Process substitution is used to pipe data to the enclosing command. The point is subtle because `sox` can use either a literal filename or a data stream as an input file.)
  +
  +
The final line identifies the output filename and data size:
  +
  +
> -b 16 output.wav
  +
  +
The `-b 16` format is important here, as your output will have a data size of 32 bits/sample without it, which is more than you need, and other software might not be able to deal with the deeper bit depth. The reason this happens is that the `-p` special filename pipes data using `sox`'s internal 32-bit format, and this makes the mixing command default to the same.

Revision as of 13:52, 8 October 2013

  1. Using `sox` in phonetic research

`sox` is a great command line tool for working with audio files, and this page illustrates some of the ways phoneticians can make use of it in their research. Once you have learned its syntax you can perform a wide variety of audio processing tasks that can easily be incorporated into a batch processing script to handle hundreds or thousands audio files.

In all of these examples, input and output files are assumed to be single-channel, unless specified otherwise. The Bash shell is also assumed, though most of the examples will work in other shells.

These are not the only possible usages, and you should consult the [`sox` documentation](http://sox.sourceforge.net/Docs/Documentation) for additional effects and options.

    1. `sox` command line basics

The basic syntax of a `sox` command is:

   > sox input1 [input2..n] output [effect]

An English translation of this is: 'Exchange this file [these files] into that file [by means of some effect].' (The name `sox` derives from 'Sound eXchange'.) Portions of the command surrounded by `[]` are optional.

It's important to note that the `sox` command always looks for at least one input file and an output file, even when it's not obvious what those files should be. For example, when you want `sox` to generate one second of white noise and don't need to copy sample rate, etc. from some other file, then specifying an input file may not make sense. In such an event, you can substitute the special filename `-n` where the input file normally goes in the command line to tell `sox` not to look for a real input file.

Other times you might not want to create a real output file, for example, if you want the output of one `sox` command to be the input for a second `sox` command without creating an intermediate file. In that case you can use the special filename `-p` in place of the output filename to tell `sox` that the output will be piped to another command's input.

    1. Converting audio file types, formats, and channels
      1. Convert file type

This converts an .au file to .wav:

   > sox input.au output.wav
      1. Change sample rate

This resamples `input.wav` and creates a new file with a sample rate of 22050Hz:

   > sox input.wav -r 22050 output.wav
      1. Change sample rate and data size

This creates a new file with the specified sample rate and specifies that sample values will be 8 bits/sample.

   > sox input.wav -r 22050 -b 8 output.wav
      1. Change sample rate and file type:

This converts .wav to .aiff and resamples the signal to 12000Hz:

   > sox input.wav -r 12000 output.aiff
      1. Override a sample rate

Occasionally an audio file's header misidentifies the sample rate of the signal because of an error when the file was created or processed (or if the file is a raw audio file there is no header). You can override the header sample rate value by specifying the real sample rate as an input file option:

   > sox -r 22050 input.wav output.wav

In this example the signal in `output.wav` contains the same number of samples as `input.wav` (i.e. it is *not* resampled), and the value of the sample rate property in the header will be 22050Hz, regardless of the sample rate value in `input.wav`'s header.

      1. Average two stereo channels into a single mono channel
   > sox input.wav -c 1 output.wav avg

Here `input.wav` is a two-channel audio file. The `-c 1` output format selects a single-channel output, and the `avg` effect averages the two input channels.

      1. Remove a channel from a stereo recording to make a mono audio file

Some digital recorders always make stereo recordings, regardless of whether you use two microphones or not, resulting in a recording that either (1) has one channel with no audio other than amplifier noise; or (2) has two identical channels. In either case it is efficient to remove the unneeded channel from the recording--since your new file contains only half the data (not counting the file header), processing will be easier and faster, and you'll only need about half the storage to save the file.

To remove a channel you tell `avg` to copy only one of the channels:

   > sox input.wav -c 1 output.wav avg -l

This copies only the left channel to the output file and drops the right one. Use `avg -r` to do the opposite.

    1. Synthesis

`sox` has a `synth` effect for generating tones or wideband noise that can be used in standalone form, or concatenated or mixed with an existing audio file.

      1. Generate a periodic waveform

This generates a 2.25 second sine wave at 300Hz (sample rate 12050; 16-bit signed integer):

   > sox -n -r 12050 -b 16 -s output.wav synth 2.25 sine 300
      1. Generate the sum of two periodic waveforms

This chains two synth effects to add a a 250Hz square wave to the sine wave:

   > sox -n -r 12050 -b 16 -s output.wav synth 2.25 sine 300 synth 2.25 square 250

See the [`sox` documentation](http://sox.sourceforge.net/Docs/Documentation) for additional options on creating swept rather than fixed frequency tones, and specifying phase, dc offset, and more.

      1. Generate an aperiodic waveform (noise)

This produces a 3.5-second file with a 44.1KHz sample rate at 16 bits/sample. The '-n' parameter tells sox that no input file is used to determine the duration or data format:

   > sox -n -r 44100 -b 16 noise.wav synth 3.5 whitenoise

You can also use noise with different spectral characteristics by using 'brownnoise' or 'pinknoise' instead of 'whitenoise'.

      1. Generate an aperiodic waveform to match an audio file

You will often want to produce a noise file that matches the recording parameters of some other file, and for that you specify an input file:

   > sox input.wav noise.wav synth whitenoise

This produces an output file that has the same number of samples (duration) and sample rate as the input file.

      1. Add noise to an audio file

You can combine two sox commands to add noise to an input file in a new output file. These commands (1) generate noise matching the duration and sample rate of the input file and then (2) mix the generated noise with the input file to produce the new output:

   > sox input.wav -p synth whitenoise vol 0.02 | sox -m input.wav - addednoise.wav

In the above example the '-p' parameter replaces the output file name in the first command and tells sox to pipe the noise output to the next sox command instead of to a file. This piped data stream shows up in the second command as the '-' argument, where the '-m' option tells sox to mix it with input.wav to produce the output file addednoise.wav.

It's important to note that the above example employs the simplest kind of mixing in which values of the two input files are simply added together, which introduces the risk of clipping in your output file. This clipping risk might not be a problem if you know that the amplitude range of your input file leaves enough room for the noise that you want to add. For instance, if the amplitude values in your input file have a maximum/minimum of +-0.6 you can add noise with a range of +-0.4 with no danger of clipping. If you want to add more noise than that you will need to apply the 'vol' gain effect or the 'norm' effect to prevent clipping.

An alternative syntax for the previous command is to use the shell's process substitution construct `<()` to create the noise as the second input file to `sox -m`:

   > sox -m input.wav <(sox input.wav -p synth whitenoise vol 0.02) addednoise.wav
    1. Adjusting the volume
      1. The `norm` effect
      1. The `vol` effect

The `vol` effect amplifies or attenuates the audio signal.

By default `sox` applies a linear scale to the signal. `vol` values greater than 1 result in amplification, and values between 0 and 1 result in attenuation. This example doubles the sample values of a signal:

   > sox input.wav output.wav vol 2

One way to maximize the amplitude of a signal without clipping is to use the `stat` effect to retrieve a multiplier that will result in a maximized signal and pass it to the `vol` effect:

   > sox input.wav output.wav vol $(sox input.wav -n stat -v 2>&1)

In the above example two calls to `sox` are necessary. The second call is performed via the shell's command substitution syntax `$()` and retrieves the multiplier (which `sox` outputs to `STDERR`, which we redirect to `STDOUT` with the `2>&1` syntax), and this multiplier is used as the literal value of the `vol` effect in the first command.

Note that the above complex command achieves the same result as the simple `norm` effect:

   > sox input.wav output.wav norm

The complex command may be faster than `norm` since it does not create a temporary file during processing, which `norm` does. For long audio files this advantage could be considerable. The complex command might not work for some types of encodings (PCM should be fine), and the `2>&1` syntax is not available in all shells.

      1. Scale the amplitude of an audio file so that the max/min values fit within a specific portion of the possible range

Here's a more complicated example. Say we want to scale our audio signal so that sample values are in the range +/-0.8 rather than the allowed maximum range +/-1.0, perhaps because we want to add noise with an amplitude range of +/-0.2. To do this we can use the `stat` effect to grab the `vol` multiplier and multiply it by 0.8:

   > sox input.wav output.wav vol $(echo "$(sox input.wav -n stat -v 2>&1) * 0.8" | bc -l)

This example makes use of nested command substitution to first retrieve, then scale, the multiplier, which is subsequently used as the literal value of the `vol` effect. The `bc` calculator must be available on your system in order to perform the multiplication.

      1. Add noise to an audio file without clipping

Putting things all together, here's one way to add noise to an audio file and be sure to avoid clipping. Note that the command is broken into multiple lines for readability. The `\` character tells the shell that command continues on the following line.

   > sf1=0.94; sf2=0.05; \
     sox -m \
     -v $(echo "$(sox input.wav -n stat -v 2>&1) * ${sf1}" | bc -l)  input.wav \
     -v ${sf2} <(sox input.wav -p synth whitenoise) \
     -b 16 output.wav

Let's break this down. This complex command mixes two input files produced by subcommands. The basic form of a mixing command is:

   > sox -m [-v factor1] input1.wav [-v factor2] input2.wav output.wav

Here `input1.wav` and `input2.wav` are mixed to produce `output.wav`. If provided, the `-v factorN` arguments linearly scale the corresponding input files.

In our complex command the first line

   > sf1=0.94; sf2=0.05;

sets the value of two shell variables, the scaling factors `sf1` and `sf2`. These are used to scale the amplitudes of the two input files.

The second line

   > sox -m

tells `sox` to look for multiple input files to mix into an output file.

The third line identifies the first input file, which will be an amplitude-scaled version of our real input file

   > -v $(echo "$(sox input.wav -n stat -v 2>&1) * ${sf1}" | bc -l)  input.wav

As we saw earlier, the `$(echo "$(sox input.wav -n stat -v 2>&1) * ${sf1}" | bc -l)` construct creates a gain multiplier that scales a signal to a percentage (`${sf1}`; 0.94) of the maximum possible amplitude. This multiplier is used as the literal value of the `-v` argument, scaling `input.wav` so that it uses 94% of the allowed amplitude range.

The next line creates and scales a noise input based on the duration and sample rate of `input.wav`:

   > -v ${sf2} <(sox input.wav -p synth whitenoise)

Here the shell's process substitution construct `<()` is used to chain the output of a nested command to the input of the enclosing `sox -m` command. The special output filename `-p` is used in place of a real output filename so that the output data can be used as input to the enclosing `sox -m` command. Since `synth` generates full-amplitude noise by default, we use `-v ${sf2}` to scale the noise to 5% (0.05) of the maximum. (Note the subtle difference between process substitution, used in this line, and command substitution `$()`, used in the preceding line. Command substitution produces a string to be used as a literal value in the enclosing command. Process substitution is used to pipe data to the enclosing command. The point is subtle because `sox` can use either a literal filename or a data stream as an input file.)

The final line identifies the output filename and data size:

   > -b 16 output.wav

The `-b 16` format is important here, as your output will have a data size of 32 bits/sample without it, which is more than you need, and other software might not be able to deal with the deeper bit depth. The reason this happens is that the `-p` special filename pipes data using `sox`'s internal 32-bit format, and this makes the mixing command default to the same.