Chapter 8 - Speech synthesis demo
Speech sounds can be minimally specified in terms of a small set of parameters, each of which can be described in terms of how they are made (physiological characteristics), or their physical (acoustic) characteristics.
Some of these parameters are isolated in the synthesized speech tokens in this table. For example, token number 1 (linked in the column labeled "1") is composed of a monotone voice with only a first formant resonance frequency. When you look at the spectrogram of this utterance, there is only one formant. Token 4 combines the first three formants, token 5 is composed of only stop release burst noises and fricatives, and finally in token 7 the voice has normal fundamental frequency variation.
This speech was synthesized in 1971 by Peter Ladefoged on a synthesizer at UCLA. The values of the parameters were a modified version of a set provided by John Holmes.
PHYSIOLOGICAL | ACOUSTIC | 1 | 2 | 3 | 4 | 5 | 6 | 7 | |
---|---|---|---|---|---|---|---|---|---|
1 | Rate of vibration of the vocal folds | Fundamental frequency | |||||||
2 | First resonance of the vocal tract | Formant 1 frequency | |||||||
3 | Formant 1 amplitude | ||||||||
4 | Second resonance of the vocal tract | Formant 2 frequency | |||||||
5 | Formant 2 amplitude | ||||||||
6 | Third resonance of the vocal tract | Formant 3 frequency | |||||||
7 | Formant 3 amplitude | ||||||||
8 | Fricative and stop bursts | Center of noise frequency | |||||||
9 | Amplitude of noise |