Difference between revisions of "Sox in phonetic research"
(41 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
<code>sox</code> is a great command line tool for working with audio files, and this page illustrates some of the ways phoneticians can make use of it in their research. Once you have learned its syntax you can perform a wide variety of audio processing tasks that can easily be incorporated into a batch processing script to handle hundreds or thousands audio files. |
<code>sox</code> is a great command line tool for working with audio files, and this page illustrates some of the ways phoneticians can make use of it in their research. Once you have learned its syntax you can perform a wide variety of audio processing tasks that can easily be incorporated into a batch processing script to handle hundreds or thousands audio files. |
||
− | |||
− | In all of these examples, input and output files are assumed to be single-channel, unless specified otherwise. The Bash shell is also assumed, though most of the examples will work in other shells. |
||
These are not the only possible usages, and you should consult the [http://sox.sourceforge.net/Docs/Documentation <code>sox</code> documentation] for additional effects and options. |
These are not the only possible usages, and you should consult the [http://sox.sourceforge.net/Docs/Documentation <code>sox</code> documentation] for additional effects and options. |
||
Line 16: | Line 14: | ||
Other times you might not want to create a real output file, for example, if you want the output of one <code>sox</code> command to be the input for a second <code>sox</code> command without creating an intermediate file. In that case you can use the special filename <code>-p</code> in place of the output filename to tell <code>sox</code> that the output will be piped to another command's input. |
Other times you might not want to create a real output file, for example, if you want the output of one <code>sox</code> command to be the input for a second <code>sox</code> command without creating an intermediate file. In that case you can use the special filename <code>-p</code> in place of the output filename to tell <code>sox</code> that the output will be piped to another command's input. |
||
+ | |||
+ | The commands shown on this page assume the <code>bash</code> shell, which is standard on OS X and many Linux distributions. The simple commands should work with almost any shell, including <code>cmd</code> on Windows. The commands that invoke <code>sox</code> multiple times will likely not work on non-<code>bash</code> shells, however. |
||
+ | |||
+ | == <code>sox</code> aliases == |
||
+ | |||
+ | Usually <code>sox</code> is called with the name <code>sox</code>. Standard installations also include aliases named <code>soxi</code>, <code>rec</code>, and <code>play</code>. These are normally in the form of symlinks on Unix-like systems (OS X and Linux) and are duplicates of <code>sox.exe</code> on Windows. If the aliases or copies are not present on your system you can create them for yourself, if you wish. |
||
+ | |||
+ | The aliases don't add any functionality to <code>sox</code>. They are merely convenient ways to run <code>sox</code> with different default behaviors appropriate for extracting information from an audio file (<code>soxi</code>), audio playback (<code>play</code>), or audio recording (<code>rec</code>). You will see <code>soxi</code> in some of the examples included on this page. |
||
== Converting audio file types, formats, and channels == |
== Converting audio file types, formats, and channels == |
||
Line 53: | Line 59: | ||
=== Average two stereo channels into a single mono channel === |
=== Average two stereo channels into a single mono channel === |
||
− | sox |
+ | sox stereo_input.wav -c 1 mono_output.wav |
− | Here <code> |
+ | Here <code>stereo_input.wav</code> is a two-channel audio file. The <code>-c 1</code> output format selects a single-channel output, and by default sox combines them into one by averaging the two inputs. |
=== Remove a channel from a stereo recording to make a mono audio file === |
=== Remove a channel from a stereo recording to make a mono audio file === |
||
Line 61: | Line 67: | ||
Some digital recorders always make stereo recordings, regardless of whether you use two microphones or not, resulting in a recording that either (1) has one channel with no audio other than amplifier noise; or (2) has two identical channels. In either case it is efficient to remove the unneeded channel from the recording--since your new file contains only half the data (not counting the file header), processing will be easier and faster, and you'll only need about half the storage to save the file. |
Some digital recorders always make stereo recordings, regardless of whether you use two microphones or not, resulting in a recording that either (1) has one channel with no audio other than amplifier noise; or (2) has two identical channels. In either case it is efficient to remove the unneeded channel from the recording--since your new file contains only half the data (not counting the file header), processing will be easier and faster, and you'll only need about half the storage to save the file. |
||
− | To remove a channel |
+ | To remove a channel use the <code>remix</code> effect. |
− | sox |
+ | sox stereo_input.wav left.wav remix 1 |
− | This copies only the left channel to the output file |
+ | This copies only the first (left) channel to the output file. To copy the second (right) channel, do this: |
+ | |||
+ | sox stereo_input.wav right.wav remix 2 |
||
+ | |||
+ | == Displaying info from an audio file header == |
||
+ | |||
+ | The <code>soxi</code> alias can be used to display information on file duration, sample rate, etc. from an audio file header. Alternatively, call <code>sox</code> with <code>--info</code> as its first argument to make it behave like the <code>soxi</code> alias. Since <code>soxi</code> might not be available on some systems, <code>sox --info</code> is recommended. |
||
+ | |||
+ | === Pretty printing header info === |
||
+ | |||
+ | This produces a formatted display of audio file header values: |
||
+ | |||
+ | sox --info input.wav |
||
+ | |||
+ | Sample output looks like this: |
||
+ | |||
+ | Input File : 'input.wav' |
||
+ | Channels : 1 |
||
+ | Sample Rate : 44100 |
||
+ | Precision : 16-bit |
||
+ | Duration : 00:00:02.23 = 98368 samples = 167.293 CDDA sectors |
||
+ | File Size : 197k |
||
+ | Bit Rate : 706k |
||
+ | Sample Encoding: 16-bit Signed Integer PCM |
||
+ | |||
+ | === Retrieving bare header values === |
||
+ | |||
+ | If you want to query an audio file header in a batch script it is more convenient to retrieve unformatted header values. For example, you can query the sample rate with the <code>-r</code> option: |
||
+ | |||
+ | sox --info -r input.wav |
||
+ | |||
+ | which simply returns this value from our sample file: |
||
+ | |||
+ | 44100 |
||
+ | |||
+ | These are the most important <code>--info</code> options: |
||
+ | |||
+ | −t # show detected file-type |
||
+ | −r # show sample-rate |
||
+ | −c # show number of channels |
||
+ | −s # show number of samples (0 if unavailable) |
||
+ | −d # show duration in hours, minutes and seconds (0 if unavailable) |
||
+ | −D # show duration in seconds (0 if unavailable) |
||
+ | −b # show number of bits per sample |
||
+ | −B # show the bitrate averaged over the whole file (0 if unavailable) |
||
+ | −e # show the name of the audio encoding |
||
+ | |||
+ | See the [http://sox.sourceforge.net/Docs/Documentation documentation for the <code>soxi</code> alias] for all possible options. |
||
== Extracting a portion of an audio file == |
== Extracting a portion of an audio file == |
||
Line 84: | Line 137: | ||
As you can see, if no second number is provided <code>sox</code> extracts until the end of the file. |
As you can see, if no second number is provided <code>sox</code> extracts until the end of the file. |
||
− | According to the <code>sox</code> documentation times are provided in <code>hh:mm:ss.fraction</code> format. In use we find that it is not necessary to convert seconds to minutes and hours, and these are equivalent: |
+ | According to the <code>sox</code> documentation times are provided in <code>hh:mm:ss.fraction</code> format. In use we find that it is not necessary to convert values greater than 60 seconds to minutes and hours, and these are equivalent: |
sox input.wav shorter.wav trim 139.2 88.3 |
sox input.wav shorter.wav trim 139.2 88.3 |
||
sox input.wav shorter.wav trim 02:19.2 01:28.3 |
sox input.wav shorter.wav trim 02:19.2 01:28.3 |
||
− | Finally, you can extract using sample numbers instead of times by suffixing the |
+ | Finally, you can extract using sample numbers instead of times by suffixing the numbers with <code>s</code>. These extract sample numbers 8000 to 10,000: |
sox input.wav shorter.wav trim 8000s 2000s |
sox input.wav shorter.wav trim 8000s 2000s |
||
sox input.wav shorter.wav trim 8000s =10000s |
sox input.wav shorter.wav trim 8000s =10000s |
||
+ | |||
+ | == Concatenating files == |
||
+ | |||
+ | Simple concatenation of two files is easy: |
||
+ | |||
+ | sox file1.wav file2.wav combined.wav |
||
+ | |||
+ | But usually this method does not produce the best results--often you will find an audible click at the point where the two files are joined. The initial values of the second file usually are not a continuation of the trajectory of the final values of the first, and the resulting discontinuity in the combined signals is perceived as a click. |
||
+ | |||
+ | You can add the <code>splice</code> effect to your concatenation to improve the transition between the two signals. With this effect <code>sox</code> will overlap the two signals by an amount you define, called the <code>excess</code>. During this interval the first signal fades out from full strength to zero, and the second signal fades in from zero to full strength: |
||
+ | |||
+ | sox file1.wav file2.wav combined.wav splice $(soxi -D file1.wav),0.010 |
||
+ | |||
+ | The first argument to the <code>splice</code> effect is the position in the first file where the fade out should end, and in the above example the subcommand <code>$(soxi -D file1.wav)</code> is used to interpolate the file length into this argument, which is a complicated way of saying that the fade out should end when the first file ends. The value of the second <code>splice</code> argument, <code>0.010</code>, tells <code>sox</code> that the fade out and fade in should occur over a 10ms window of the input files. You'll notice that this overlap means that the duration of the concatenated file is slightly shorter than the sum of the durations of the input files. |
||
== Synthesis == |
== Synthesis == |
||
Line 100: | Line 167: | ||
=== Generate a periodic waveform === |
=== Generate a periodic waveform === |
||
− | This generates a 2.25 second sine wave at 300Hz ( |
+ | This generates a 2.25 second sine wave at 300Hz (16-bit; by default <code>sox</code> sox produces 32-bit files, and these are not readable by all audio programs): |
+ | |||
+ | sox -n -b 16 output.wav synth 2.25 sine 300 |
||
+ | |||
+ | The <code>-n</code> input file parameter indicates that no input file is used to determine the output file's length or data format. |
||
+ | |||
+ | === Generate a periodic waveform with a specific amplitude === |
||
+ | |||
+ | The preceding synthesis command creates a full-height wave (amplitude +/-1). You can chain the <code>vol</code> effect to your synthesis to produce a wave of a specified height: |
||
+ | |||
+ | sox -n -b 16 output.wav synth 2.25 sine 300 vol 0.5 |
||
+ | This is useful when you want to mix multiple waves and want full control over their relative amplitudes. |
||
− | sox -n -r 12050 -b 16 -s output.wav synth 2.25 sine 300 |
||
− | === Generate the sum of two periodic waveforms === |
+ | === Generate the sum of two periodic waveforms, with default mixing === |
This chains two synth effects to add a a 250Hz square wave to the sine wave: |
This chains two synth effects to add a a 250Hz square wave to the sine wave: |
||
− | sox -n |
+ | sox -n -b 16 output.wav synth 2.25 sine 300 synth 2.25 square mix 250 |
+ | |||
+ | Note the <code>mix</code> parameter adds the second synthesis to the first. By default <code>sox</code> scales the amplitude of each input channel by <code>1/n</code>, where n is the number of channels, to prevent clipping. |
||
See the [http://sox.sourceforge.net/Docs/Documentation <code>sox</code> documentation] for additional options on creating swept rather than fixed frequency tones, and specifying phase, dc offset, and more. |
See the [http://sox.sourceforge.net/Docs/Documentation <code>sox</code> documentation] for additional options on creating swept rather than fixed frequency tones, and specifying phase, dc offset, and more. |
||
+ | |||
+ | === Generate the sum of multiple waveforms of specific heights === |
||
+ | |||
+ | A simple way to create a complex waveform of differing amplitudes is to first synthesize the component waves and mix them together with <code>sox -m</code>: |
||
+ | |||
+ | sox -n 100full.wav synth 2 sine 100 |
||
+ | sox -n 200half.wav synth 2 sine 200 vol 0.5 |
||
+ | sox -m 100full.wav 200half.wav -b 16 complex.wav |
||
+ | |||
+ | In this example, the full height wave in <code>100full.wav</code> is combined with the half-height 200Hz signal in <code>200half.wav</code>. The result is in the 16-bit file <code>complex.wav</code>. Note that when we mix with <code>sox -m</code> the value of each channel is divided by the number of channels being mixed to prevent clipping. The result is that the sample values of the complex signal do not cover the entire allowed amplitude range of a <code>.wav</code> file since the second synthesized file is half-height. |
||
+ | |||
+ | You can also use the <code>bash</code> <code><( )</code> substitution syntax to generate the 100Hz and 200Hz signals in place and skip the creation of intermediate files: |
||
+ | |||
+ | sox -m <(sox -n -p synth 2 sine 100) <(sox -n -p synth 2 sine 200 vol 0.5) -b 16 complex.wav |
||
+ | |||
+ | In this example, the input files to <code>sox -m</code> are replaced with synthesis commands that generate waves of differing height and that send their output to <code>STDOUT</code> streams via the special <code>-p</code> output file parameter. The enclosing <code>sox -m</code> command reads these streams from the synthesis subcommands as separate input files and mixes them together. The results of mixing with and without intermediate files are exactly the same. |
||
+ | |||
+ | You can increase the number of input files to create even more complex combinations: |
||
+ | |||
+ | sox -m <(sox -n -p synth 2 sine 100) <(sox -n -p synth 2 sine 200 vol 0.5) <(sox -n -p synth 2 sine 300 vol 0.25) -b 16 complex.wav |
||
+ | |||
+ | This command adds a quarter-height 300Hz signal to the preceding complex signal. |
||
=== Generate an aperiodic waveform (noise) === |
=== Generate an aperiodic waveform (noise) === |
||
− | This produces a 3.5-second file with |
+ | This produces a 3.5-second file with 16 bits/sample. The '-n' parameter tells sox that no input file is used to determine the duration or data format: |
− | sox -n |
+ | sox -n -b 16 noise.wav synth 3.5 whitenoise |
You can also use noise with different spectral characteristics by using 'brownnoise' or 'pinknoise' instead of 'whitenoise'. |
You can also use noise with different spectral characteristics by using 'brownnoise' or 'pinknoise' instead of 'whitenoise'. |
||
Line 141: | Line 242: | ||
sox -m input.wav <(sox input.wav -p synth whitenoise vol 0.02) addednoise.wav |
sox -m input.wav <(sox input.wav -p synth whitenoise vol 0.02) addednoise.wav |
||
+ | |||
+ | == Filtering == |
||
+ | |||
+ | The <code>lowpass</code> and <code>highpass</code> effects can be used to filter your audio signal. For instance, if your .wav file contains an aerodynamic signal rather than normal speech audio, you might wish to perform a lowpass filter: |
||
+ | |||
+ | sox input.wav output.wav lowpass 100.0 |
||
+ | |||
+ | The above command applies a 100Hz lowpass filter on the input file. A sharper frequency cutoff can be produced with the <code>sinc</code> effect: |
||
+ | |||
+ | sox input.wav output.wav sinc -100.0 |
||
+ | |||
+ | A negative value supplied to <code>sinc</code> indicates a lowpass filter, and highpass is selected with a positive value. |
||
+ | |||
== Adjusting the volume == |
== Adjusting the volume == |
Latest revision as of 12:38, 14 April 2020
sox
is a great command line tool for working with audio files, and this page illustrates some of the ways phoneticians can make use of it in their research. Once you have learned its syntax you can perform a wide variety of audio processing tasks that can easily be incorporated into a batch processing script to handle hundreds or thousands audio files.
These are not the only possible usages, and you should consult the sox
documentation for additional effects and options.
sox
command line basics
The basic syntax of a sox
command is:
sox input1 [input2..n] output [effect]
An English translation of this is: 'Exchange this file [and these files] into that file [by means of some effect].' (The name sox
derives from 'Sound eXchange'.) Portions of the command surrounded by []
are optional.
It's important to note that the sox
command always looks for at least one input file and an output file, even when it's not obvious what those files should be. For example, when you want sox
to generate one second of white noise and don't need to copy sample rate, etc. from some other file, then specifying an input file may not make sense. In such an event, you can substitute the special filename -n
where the input file normally goes in the command line to tell sox
not to look for a real input file.
Other times you might not want to create a real output file, for example, if you want the output of one sox
command to be the input for a second sox
command without creating an intermediate file. In that case you can use the special filename -p
in place of the output filename to tell sox
that the output will be piped to another command's input.
The commands shown on this page assume the bash
shell, which is standard on OS X and many Linux distributions. The simple commands should work with almost any shell, including cmd
on Windows. The commands that invoke sox
multiple times will likely not work on non-bash
shells, however.
sox
aliases
Usually sox
is called with the name sox
. Standard installations also include aliases named soxi
, rec
, and play
. These are normally in the form of symlinks on Unix-like systems (OS X and Linux) and are duplicates of sox.exe
on Windows. If the aliases or copies are not present on your system you can create them for yourself, if you wish.
The aliases don't add any functionality to sox
. They are merely convenient ways to run sox
with different default behaviors appropriate for extracting information from an audio file (soxi
), audio playback (play
), or audio recording (rec
). You will see soxi
in some of the examples included on this page.
Converting audio file types, formats, and channels
Convert file type
This converts an .au file to .wav:
sox input.au output.wav
Change sample rate
This resamples input.wav
and creates a new file with a sample rate of 22050Hz:
sox input.wav -r 22050 output.wav
Change sample rate and data size
This creates a new file with the specified sample rate and specifies that sample values will be 8 bits/sample.
sox input.wav -r 22050 -b 8 output.wav
Change sample rate and file type:
This converts .wav to .aiff and resamples the signal to 12000Hz:
sox input.wav -r 12000 output.aiff
Override a sample rate
Occasionally an audio file's header misidentifies the sample rate of the signal because of an error when the file was created or processed (or if the file is a raw audio file there is no header). You can override the header sample rate value by specifying the real sample rate as an input file option:
sox -r 22050 input.wav output.wav
In this example the signal in output.wav
contains the same number of samples as input.wav
(i.e. it is *not* resampled), and the value of the sample rate property in the header will be 22050Hz, regardless of the sample rate value in input.wav
's header.
Average two stereo channels into a single mono channel
sox stereo_input.wav -c 1 mono_output.wav
Here stereo_input.wav
is a two-channel audio file. The -c 1
output format selects a single-channel output, and by default sox combines them into one by averaging the two inputs.
Remove a channel from a stereo recording to make a mono audio file
Some digital recorders always make stereo recordings, regardless of whether you use two microphones or not, resulting in a recording that either (1) has one channel with no audio other than amplifier noise; or (2) has two identical channels. In either case it is efficient to remove the unneeded channel from the recording--since your new file contains only half the data (not counting the file header), processing will be easier and faster, and you'll only need about half the storage to save the file.
To remove a channel use the remix
effect.
sox stereo_input.wav left.wav remix 1
This copies only the first (left) channel to the output file. To copy the second (right) channel, do this:
sox stereo_input.wav right.wav remix 2
Displaying info from an audio file header
The soxi
alias can be used to display information on file duration, sample rate, etc. from an audio file header. Alternatively, call sox
with --info
as its first argument to make it behave like the soxi
alias. Since soxi
might not be available on some systems, sox --info
is recommended.
Pretty printing header info
This produces a formatted display of audio file header values:
sox --info input.wav
Sample output looks like this:
Input File : 'input.wav' Channels : 1 Sample Rate : 44100 Precision : 16-bit Duration : 00:00:02.23 = 98368 samples = 167.293 CDDA sectors File Size : 197k Bit Rate : 706k Sample Encoding: 16-bit Signed Integer PCM
Retrieving bare header values
If you want to query an audio file header in a batch script it is more convenient to retrieve unformatted header values. For example, you can query the sample rate with the -r
option:
sox --info -r input.wav
which simply returns this value from our sample file:
44100
These are the most important --info
options:
−t # show detected file-type −r # show sample-rate −c # show number of channels −s # show number of samples (0 if unavailable) −d # show duration in hours, minutes and seconds (0 if unavailable) −D # show duration in seconds (0 if unavailable) −b # show number of bits per sample −B # show the bitrate averaged over the whole file (0 if unavailable) −e # show the name of the audio encoding
See the documentation for the soxi
alias for all possible options.
Extracting a portion of an audio file
Use the trim
effect to extract a portion of an audio file. For example, you can extract the first 10 seconds of an input file with:
sox input.wav shorter.wav trim 0 10
The two numbers supplied to the trim
effect define the start time and duration of the extracted audio. If the second number is prefixed with =
, then the second number defines the end time of the extraction. The following commands are equivalent:
sox input.wav shorter.wav trim 11.2 5 # extract 5 seconds of audio, starting at 11.2 seconds sox input.wav shorter.wav trim 11.2 =16.2 # extract audio from 11.2 to 16.2 seconds
If a time is prefixed with -
it is interpreted relative to the end of the audio. This extracts the final 7.5 seconds:
sox input.wav shorter.wav trim -7.5
As you can see, if no second number is provided sox
extracts until the end of the file.
According to the sox
documentation times are provided in hh:mm:ss.fraction
format. In use we find that it is not necessary to convert values greater than 60 seconds to minutes and hours, and these are equivalent:
sox input.wav shorter.wav trim 139.2 88.3 sox input.wav shorter.wav trim 02:19.2 01:28.3
Finally, you can extract using sample numbers instead of times by suffixing the numbers with s
. These extract sample numbers 8000 to 10,000:
sox input.wav shorter.wav trim 8000s 2000s sox input.wav shorter.wav trim 8000s =10000s
Concatenating files
Simple concatenation of two files is easy:
sox file1.wav file2.wav combined.wav
But usually this method does not produce the best results--often you will find an audible click at the point where the two files are joined. The initial values of the second file usually are not a continuation of the trajectory of the final values of the first, and the resulting discontinuity in the combined signals is perceived as a click.
You can add the splice
effect to your concatenation to improve the transition between the two signals. With this effect sox
will overlap the two signals by an amount you define, called the excess
. During this interval the first signal fades out from full strength to zero, and the second signal fades in from zero to full strength:
sox file1.wav file2.wav combined.wav splice $(soxi -D file1.wav),0.010
The first argument to the splice
effect is the position in the first file where the fade out should end, and in the above example the subcommand $(soxi -D file1.wav)
is used to interpolate the file length into this argument, which is a complicated way of saying that the fade out should end when the first file ends. The value of the second splice
argument, 0.010
, tells sox
that the fade out and fade in should occur over a 10ms window of the input files. You'll notice that this overlap means that the duration of the concatenated file is slightly shorter than the sum of the durations of the input files.
Synthesis
sox
has a synth
effect for generating tones or wideband noise that can be used in standalone form, or concatenated or mixed with an existing audio file.
Generate a periodic waveform
This generates a 2.25 second sine wave at 300Hz (16-bit; by default sox
sox produces 32-bit files, and these are not readable by all audio programs):
sox -n -b 16 output.wav synth 2.25 sine 300
The -n
input file parameter indicates that no input file is used to determine the output file's length or data format.
Generate a periodic waveform with a specific amplitude
The preceding synthesis command creates a full-height wave (amplitude +/-1). You can chain the vol
effect to your synthesis to produce a wave of a specified height:
sox -n -b 16 output.wav synth 2.25 sine 300 vol 0.5
This is useful when you want to mix multiple waves and want full control over their relative amplitudes.
Generate the sum of two periodic waveforms, with default mixing
This chains two synth effects to add a a 250Hz square wave to the sine wave:
sox -n -b 16 output.wav synth 2.25 sine 300 synth 2.25 square mix 250
Note the mix
parameter adds the second synthesis to the first. By default sox
scales the amplitude of each input channel by 1/n
, where n is the number of channels, to prevent clipping.
See the sox
documentation for additional options on creating swept rather than fixed frequency tones, and specifying phase, dc offset, and more.
Generate the sum of multiple waveforms of specific heights
A simple way to create a complex waveform of differing amplitudes is to first synthesize the component waves and mix them together with sox -m
:
sox -n 100full.wav synth 2 sine 100 sox -n 200half.wav synth 2 sine 200 vol 0.5 sox -m 100full.wav 200half.wav -b 16 complex.wav
In this example, the full height wave in 100full.wav
is combined with the half-height 200Hz signal in 200half.wav
. The result is in the 16-bit file complex.wav
. Note that when we mix with sox -m
the value of each channel is divided by the number of channels being mixed to prevent clipping. The result is that the sample values of the complex signal do not cover the entire allowed amplitude range of a .wav
file since the second synthesized file is half-height.
You can also use the bash
<( )
substitution syntax to generate the 100Hz and 200Hz signals in place and skip the creation of intermediate files:
sox -m <(sox -n -p synth 2 sine 100) <(sox -n -p synth 2 sine 200 vol 0.5) -b 16 complex.wav
In this example, the input files to sox -m
are replaced with synthesis commands that generate waves of differing height and that send their output to STDOUT
streams via the special -p
output file parameter. The enclosing sox -m
command reads these streams from the synthesis subcommands as separate input files and mixes them together. The results of mixing with and without intermediate files are exactly the same.
You can increase the number of input files to create even more complex combinations:
sox -m <(sox -n -p synth 2 sine 100) <(sox -n -p synth 2 sine 200 vol 0.5) <(sox -n -p synth 2 sine 300 vol 0.25) -b 16 complex.wav
This command adds a quarter-height 300Hz signal to the preceding complex signal.
Generate an aperiodic waveform (noise)
This produces a 3.5-second file with 16 bits/sample. The '-n' parameter tells sox that no input file is used to determine the duration or data format:
sox -n -b 16 noise.wav synth 3.5 whitenoise
You can also use noise with different spectral characteristics by using 'brownnoise' or 'pinknoise' instead of 'whitenoise'.
Generate an aperiodic waveform to match an audio file
You will often want to produce a noise file that matches the recording parameters of some other file, and for that you specify an input file:
sox input.wav noise.wav synth whitenoise
This produces an output file that has the same number of samples (duration) and sample rate as the input file.
Add noise to an audio file
You can combine two sox commands to add noise to an input file in a new output file. These commands (1) generate noise matching the duration and sample rate of the input file and then (2) mix the generated noise with the input file to produce the new output:
sox input.wav -p synth whitenoise vol 0.02 | sox -m input.wav - addednoise.wav
In the above example the '-p' parameter replaces the output file name in the first command and tells sox to pipe the noise output to the next sox command instead of to a file. This piped data stream shows up in the second command as the '-' argument, where the '-m' option tells sox to mix it with input.wav to produce the output file addednoise.wav.
It's important to note that the above example employs the simplest kind of mixing in which values of the two input files are simply added together, which introduces the risk of clipping in your output file. This clipping risk might not be a problem if you know that the amplitude range of your input file leaves enough room for the noise that you want to add. For instance, if the amplitude values in your input file have a maximum/minimum of +-0.6 you can add noise with a range of +-0.4 with no danger of clipping. If you want to add more noise than that you will need to apply the 'vol' gain effect or the 'norm' effect to prevent clipping.
An alternative syntax for the previous command is to use the shell's process substitution construct <()
to create the noise as the second input file to sox -m
:
sox -m input.wav <(sox input.wav -p synth whitenoise vol 0.02) addednoise.wav
Filtering
The lowpass
and highpass
effects can be used to filter your audio signal. For instance, if your .wav file contains an aerodynamic signal rather than normal speech audio, you might wish to perform a lowpass filter:
sox input.wav output.wav lowpass 100.0
The above command applies a 100Hz lowpass filter on the input file. A sharper frequency cutoff can be produced with the sinc
effect:
sox input.wav output.wav sinc -100.0
A negative value supplied to sinc
indicates a lowpass filter, and highpass is selected with a positive value.
Adjusting the volume
The norm
effect
The vol
effect
The vol
effect amplifies or attenuates the audio signal.
By default sox
applies a linear scale to the signal. vol
values greater than 1 result in amplification, and values between 0 and 1 result in attenuation. This example doubles the sample values of a signal:
sox input.wav output.wav vol 2
One way to maximize the amplitude of a signal without clipping is to use the stat
effect to retrieve a multiplier that will result in a maximized signal and pass it to the vol
effect:
sox input.wav output.wav vol $(sox input.wav -n stat -v 2>&1)
In the above example two calls to sox
are necessary. The second call is performed via the shell's command substitution syntax $()
and retrieves the multiplier (which sox
outputs to STDERR
, which we redirect to STDOUT
with the 2>&1
syntax), and this multiplier is used as the literal value of the vol
effect in the first command.
Note that the above complex command achieves the same result as the simple norm
effect:
sox input.wav output.wav norm
The complex command may be faster than norm
since it does not create a temporary file during processing, which norm
does. For long audio files this advantage could be considerable. The complex command might not work for some types of encodings (PCM should be fine), and the 2>&1
syntax is not available in all shells.
Scale the amplitude of an audio file so that the max/min values fit within a specific portion of the possible range
Here's a more complicated example. Say we want to scale our audio signal so that sample values are in the range +/-0.8 rather than the allowed maximum range +/-1.0, perhaps because we want to add noise with an amplitude range of +/-0.2. To do this we can use the stat
effect to grab the vol
multiplier and multiply it by 0.8:
sox input.wav output.wav vol $(echo "$(sox input.wav -n stat -v 2>&1) * 0.8" | bc -l)
This example makes use of nested command substitution to first retrieve, then scale, the multiplier, which is subsequently used as the literal value of the vol
effect. The bc
calculator must be available on your system in order to perform the multiplication.
Add noise to an audio file without clipping
Putting things all together, here's one way to add noise to an audio file and be sure to avoid clipping. Note that the command is broken into multiple lines for readability. The \
character tells the shell that the command continues on the following line.
sf1=0.94; sf2=0.05; \ sox -m \ -v $(echo "$(sox input.wav -n stat -v 2>&1) * ${sf1}" | bc -l) input.wav \ -v ${sf2} <(sox input.wav -p synth whitenoise) \ -b 16 output.wav
Let's break this down. This complex command mixes two input files produced by subcommands. The basic form of a mixing command is:
sox -m [-v factor1] input1.wav [-v factor2] input2.wav output.wav
Here input1.wav
and input2.wav
are mixed to produce output.wav
. If provided, the -v factorN
arguments linearly scale the corresponding input files.
In our complex command the first line
sf1=0.94; sf2=0.05;
sets the value of two shell variables, the scaling factors sf1
and sf2
. These are used to scale the amplitudes of the two input files.
The second line
sox -m
tells sox
to look for multiple input files to mix into an output file.
The third line identifies the first input file, which will be an amplitude-scaled version of our real input file
-v $(echo "$(sox input.wav -n stat -v 2>&1) * ${sf1}" | bc -l) input.wav
As we saw earlier, the $(echo "$(sox input.wav -n stat -v 2>&1) * ${sf1}" | bc -l)
construct creates a gain multiplier that scales a signal to a percentage (${sf1}
; 0.94) of the maximum possible amplitude. This multiplier is used as the literal value of the -v
argument, scaling input.wav
so that it uses 94% of the allowed amplitude range.
The next line creates and scales a noise input based on the duration and sample rate of input.wav
:
-v ${sf2} <(sox input.wav -p synth whitenoise)
Here the shell's process substitution construct <()
is used to chain the output of a nested command to the input of the enclosing sox -m
command. The special output filename -p
is used in place of a real output filename so that the output data can be used as input to the enclosing sox -m
command. Since synth
generates full-amplitude noise by default, we use -v ${sf2}
to scale the noise to 5% (0.05) of the maximum. (Note the subtle difference between process substitution, used in this line, and command substitution $()
, used in the preceding line. Command substitution produces a string to be used as a literal value in the enclosing command. Process substitution is used to pipe data to the enclosing command. The point is subtle because sox
can use either a literal filename or a data stream as an input file.)
The final line identifies the output filename and data size:
-b 16 output.wav
The -b 16
format is important here, as your output will have a data size of 32 bits/sample without it, which is more than you need, and other software might not be able to deal with the deeper bit depth. The reason this happens is that the -p
special filename pipes data using sox
's internal 32-bit format, and this makes the mixing command default to the same.