Show Less
Open access

Acoustics of the Vowel

Preliminaries

Dieter Maurer

It seems as if the fundamentals of how we produce vowels and how they are acoustically represented have been clarified: we phonate and articulate. Using our vocal chords, we produce a vocal sound or noise which is then shaped into a specific vowel sound by the resonances of the pharyngeal, oral, and nasal cavities, that is, the vocal tract. Accordingly, the acoustic description of vowels relates to vowelspecific patterns of relative energy maxima in the sound spectra, known as patterns of formants.
The intellectual and empirical reasoning presented in this treatise, however, gives rise to scepticism with respect to this understanding of the sound of the vowel. The reflections and materials presented provide reason to argue that, up to now, a comprehensible theory of the acoustics of the voice and of voiced speech sounds is lacking, and consequently, no satisfying understanding of vowels as an achievement and particular formal accomplishment of the voice exists. Thus, the question of the acoustics of the vowel – and with it the question of the acoustics of the voice itself – proves to be an unresolved fundamental problem.
Show Summary Details
Open access

12 Empirical Falsification despite Methodological Limitations of Determining Patterns of Relative Spectral Envelope Maxima or Formant Patterns

12    Empirical Falsification despite Methodological Limitations of Determining Patterns of Relative Spectral Envelope Maxima or Formant Patterns

12.1    Lack of Methodological Basis for Verifying Prevailing Theory

Concerning isolated vowel sounds exhibiting quasi-static spectral characteristics and allowing for clear perceptual vowel recognition and distinction, it is not possible, in a particular language, to formulate general rules for determining patterns of relative spectral energy maxima or of formant patterns which consistently correspond to the perceived vowel quality of the sounds.

Consequently, it is not possible to gather general statistical data on vowel-specific formant frequencies of recognisable vowel sounds referring to the entire realm of utterances.


Prevailing theory cannot be verified for methodological reasons.


From a methodological perspective, prevailing theory, thus, is not endowed with adequate analytical instruments for capturing and describing the phenomenon of the vowel.

Existing references regarding formant statistics do not disclose this problem for two reasons: firstly, the investigated speakers are generally not subject to a qualitative selection regarding their vocal abilities; secondly, such statistics generally exclude any systematic and extensive variation of fundamental frequency. Both factors, however, are essential prerequisites for studying the possible fundamental frequency ranges of intelligible vowel sounds and for examining the appropriateness of the methods of acoustic analysis with regard to the entire realm of utterances. (Moreover, qualitative speaker selection also allows for the study of other important aspects of vowel-sound variation, above all variation of vocal effort, register and phonation type.) ← 74 | 75 →

12.2    Systematic Divergence of Empirical Findings from Predictions of Prevailing Theory

If lower relative spectral energy maxima can be determined and if correspondent formant frequency calculation can be methodically substantiated, in most cases, the corresponding patterns ≤ 1.5 kHz, that is the lower frequency range of the spectra, prove to be dependent on the fundamental frequency of the sounds relative to the recognised vowel.

Speakers of a given speech community, despite having different vocal-tract sizes and thus belonging to different speaker groups, are nevertheless able to produce the sounds of one and the same vowel at quasi-identical fundamental frequencies and with quasi-identical lower formant frequencies ≤ 1.5 kHz. Moreover, speakers with comparatively larger vocal-tract sizes can produce sounds of some vowels at higher fundamental frequencies and with higher F1 or even higher F1–F2 values than speakers with comparably smaller vocal-tract sizes.

These empirical findings are reciprocally related. They diverge systematically from both the predicted independence of vowel-specific formant patterns on fundamental frequency and the predicted pervasive dependence of vowel-specific formant patterns on speaker-group or vocal-tract size, respectively.


Empirical findings diverge systematically from the predictions of prevailing theory.


From an empirical perspective, prevailing theory thus proves to be inadequate.

12.3    Empirical Findings Directly Contradicting Prevailing Theory

A single speaker may not only occasionally produce different isolated sounds of different vowels exhibiting the same formant patterns F1–F2 or F1–F2–F3 but, for some vowel qualities, this formant pattern ambiguity of vowel sounds in relation to the perceived vowel quality is systematic if the entire range of fundamental frequency of intelligible vowel sounds is investigated. In these cases of ambiguity, speakers cannot substantially vary fundamental frequency, maintain vowel quality and also maintain formant patterns: if the speaker maintains the vowel quality, the formant pattern will alter, or if the formant pattern is kept constant, the vowel quality will change. ← 75 | 76 →

This observation also holds true for patterns of spectral energy maxima. Moreover, in some cases, as mentioned, even the entire interpretable spectral envelope proves to be ambiguous.


Empirical findings can directly contradict the predictions of prevailing theory.


Consequently, prevailing theory is falsified because, for a substantial portion of vowel sounds, the opposite of what the theory claims to be true actually applies: in many cases, given a variation of fundamental frequency, vowel sounds with very different formant patterns allow for a perception of the same vowel quality, while vowel sounds with similar formant patterns allow for a perception of different vowel qualities. ← 76 | 77 →