Difference between revisions of "Acoustic Analysis"

From Phonlab
Jump to navigationJump to search
 
(4 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
'''Burst detection (burst.py)'''
 
'''Burst detection (burst.py)'''
   
This script does the following things:
+
This script defines a function burst()
   
1) Defines a function burst() which takes as input three arguments: a soundfile, start time, and end time.
+
-- input 1) the name of a soundfile, 2) start time (in seconds), and (3) end time of an analysis window where we will look for a burst.
   
the output is a burst_time (in seconds) and a burst_score which is a measure of how much like a burst the burst is.
+
-- output 1) a burst_time (in seconds), and 2) a burst_score which is a measure of how much like a burst the burst is.
   
2) Resamples sound file to 16,000 Hz.
 
   
  +
The script does the following things:
   
 
1) Resample sound file to 16,000 Hz.
In the soundfile waveform:
 
   
3) Within the window specified by the start and end times, goes through each sample one by one and determines whether it is a peak or valley.
 
   
  +
'''With the Audio Waveform:'''
4) Finds three biggest valleys in the waveform (corresponding to pressure peaks) within specified time window.
 
   
 
2) Within the window specified by the start and end times, look at each sample and determines whether it is a peak or valley.
5) Gives each of these peaks a time-stamped score based on amount of change in waveform relative to neighboring samples
 
   
 
3) Find three biggest valleys in the waveform (corresponding to pressure peaks) within specified time window. The biggest valleys are those that have the largest difference at the valley relative to the adjacent samples. This is the ''waveform_difference''.
   
In the spectragram:
 
   
  +
'''Compute a series of Mel Frequency Spectra:'''
6) Now, takes a Mel frequency spectrum with freq above 300 Hz, in 5 ms windows.
 
   
  +
4) Now, take Mel frequency spectra from 300Hz to 8000Hz, in non-overlapping 5 ms windows, spanning the interval from start to end times. This is done with the ESPS routine melspec()
7) Compares this window to the next 5 ms spectral window, and selects top three candidates with most change in the spectrum.
 
   
  +
5) Compare successive windows, and select the top three candidates with most change in the spectrum. This is done with the ESPS routine diff(). This is the ''spectral_difference''.
8) Compares waveform candidates to spectrum candidates, keeps those where time scores align
 
   
   
  +
'''Combine these two acoustic landmarks into a burst score'''
Calculates a burst score (burst strength) for remaining candidates. Using a linear model trained on the burst locations in TIMIT:
 
   
 
6) Compare waveform candidates to spectrum candidates, and keep those where the times align.
10) Selects candidate with highest burst score.
 
   
 
7) Calculate a burst score (burst strength) for the remaining candidates. This is done using a linear discriminant function trained on bursts in TIMIT.
10) If there is no burst, returns a burst_time of -1.
 
  +
  +
''b'' - the burst score is calcuated: ''b = -1.814 + 0.618*log(waveform_difference) + 0.003*spectral_difference''
  +
  +
  +
8) Select the candidate with highest burst score, and report the score and the time location of the burst (based on the waveform peak, which is more accurate.
  +
 
9) If no burst is detected, return a burst score of 0 and a burst_time of -1.

Latest revision as of 10:26, 26 April 2018

Burst detection (burst.py)

This script defines a function burst()

-- input 1) the name of a soundfile, 2) start time (in seconds), and (3) end time of an analysis window where we will look for a burst.

-- output 1) a burst_time (in seconds), and 2) a burst_score which is a measure of how much like a burst the burst is.


The script does the following things:

1) Resample sound file to 16,000 Hz.


With the Audio Waveform:

2) Within the window specified by the start and end times, look at each sample and determines whether it is a peak or valley.

3) Find three biggest valleys in the waveform (corresponding to pressure peaks) within specified time window. The biggest valleys are those that have the largest difference at the valley relative to the adjacent samples. This is the waveform_difference.


Compute a series of Mel Frequency Spectra:

4) Now, take Mel frequency spectra from 300Hz to 8000Hz, in non-overlapping 5 ms windows, spanning the interval from start to end times. This is done with the ESPS routine melspec()

5) Compare successive windows, and select the top three candidates with most change in the spectrum. This is done with the ESPS routine diff(). This is the spectral_difference.


Combine these two acoustic landmarks into a burst score

6) Compare waveform candidates to spectrum candidates, and keep those where the times align.

7) Calculate a burst score (burst strength) for the remaining candidates. This is done using a linear discriminant function trained on bursts in TIMIT.

b - the burst score is calcuated: b = -1.814 + 0.618*log(waveform_difference) + 0.003*spectral_difference


8) Select the candidate with highest burst score, and report the score and the time location of the burst (based on the waveform peak, which is more accurate.

9) If no burst is detected, return a burst score of 0 and a burst_time of -1.