Show Less
Open access

Acoustics of the Vowel


Dieter Maurer

It seems as if the fundamentals of how we produce vowels and how they are acoustically represented have been clarified: we phonate and articulate. Using our vocal chords, we produce a vocal sound or noise which is then shaped into a specific vowel sound by the resonances of the pharyngeal, oral, and nasal cavities, that is, the vocal tract. Accordingly, the acoustic description of vowels relates to vowelspecific patterns of relative energy maxima in the sound spectra, known as patterns of formants.
The intellectual and empirical reasoning presented in this treatise, however, gives rise to scepticism with respect to this understanding of the sound of the vowel. The reflections and materials presented provide reason to argue that, up to now, a comprehensible theory of the acoustics of the voice and of voiced speech sounds is lacking, and consequently, no satisfying understanding of vowels as an achievement and particular formal accomplishment of the voice exists. Thus, the question of the acoustics of the vowel – and with it the question of the acoustics of the voice itself – proves to be an unresolved fundamental problem.
Show Summary Details
Open access

4 Vowels and Fundamental Frequency

4    Vowels and Fundamental Frequency

4.1    Fundamental Frequency, First Formant and “Grade” of Vowels

According to prevailing theory, vowel-specific formant patterns are independent of the fundamental frequency of their respective individual sounds.

In general, the frequencies of the first formant of all vowels, as specified in current formant statistics for sounds produced in citation-form words, comparable to relaxed speech, lie within the range of the possible fundamental frequencies for the speakers of a given speaker group. Concerning long German vowels, the lowest statistical values for F1 are given for / i, y, u /, medium values for /e, ø, o /, followed by values for /ε, ɔ / and the highest values are indicated for /a–α /.

If the fundamental frequency involved in producing vowel sounds exceeds the frequencies of the first formant of / i, y, u / and approaches the frequencies of the first formant of /e, ø, o /, then it is to be expected that the vowels / i, y, u / become unintelligible because their first vowel-specific formant is no longer physically representable. Thus, the vowels / i, y, u / would be of a “lower grade”, that is, more restricted in their production, physical representation and intelligibility than the other vowels. The same would apply to /e, ø, o / compared to /ε, a, α, ɔ / and to /ε, ɔ / compared to /a–α /.

In line with prevailing theory, the possibility that the fundamental frequency of a vowel sound can exceed the first formant frequency of a vowel quality as given in formant statistics leads to the assumption that the “grade” of vowels differs because of vowel-specific acoustic characteristics.

However, everyday experience refutes such a generalising conclusion. If speakers of a given speaker group produce vowel sounds, and if the fundamental frequency of these sounds exceeds the frequencies of the statistically given first formant of / i, y, u / and approaches the frequencies of the first formant of /e, ø, o /, then all of the six vowels mentioned can be produced with the same “grade” of vowel perception, given speakers with correspondingly good vocal abilities. There is no general impairment of vowel perception for the sounds of / i, y, u / if the fundamental frequency exceeds statistical F1. ← 35 | 36 →

The same holds true—although it is less obvious in everyday utterances and only for good voices—for the vowels /e, ø, o / produced at fundamental frequencies higher than the statistical values of their first formant frequencies.

Speakers with excellent vocal abilities can even produce clearly intelligible cardinal vowels up to a fundamental frequency that corresponds to the highest statistical F1 of all vowels of the language they master.

In this context, special attention needs to be given to everyday speaking styles or habits that exhibit a fundamental frequency variation of one octave or more. Such styles and habits plainly reveal the significance of the problem of fundamental frequencies above statistical first-formant frequencies, confronting the prevailing acoustic theory of the vowel.

Special attention also needs to be given to utterances of stage voi­ces (in musical and straight theatre, entertainment, film, television etc.) because extensive fundamental frequency variation is one of the hallmarks of the singing and speaking voice in the context of art and entertainment.

Generally, with regard to a fundamental frequency range up to the maximum frequency of the first formant as given in formant statistics, no principally different “grades” of vowel perception in relation to fundamental and first formant frequency can be experienced.

4.2    Fundamental Frequency, Spectral Envelope, Formant Pattern and “Grade” of Vowels

If the fundamental frequency of a sound increases, so too does the frequency spacing between the harmonics in the spectrum. As a consequence, determining the spectral envelopes and their maxima becomes difficult. The same applies to the calculation of formant frequencies. According to prevailing theory, it is to be expected that the “grade” of vowel perception is in general also dependent on the fundamental frequency of the sounds: with regard to fundamental frequency, the expected tendency for vowel perception is: the lower, the better; the higher, the worse.

Indeed, considering vowel sounds at higher pitches, many scholars interpret these sounds as related to a spectral undersampling of the formants.

However, one does not only have to consider a general interrelation between fundamental frequency, harmonic spectrum, spectral enve ← 36 | 37 → lope and expected formant frequencies, but also a formant-specific role within this interrelation: depending upon given statistical frequency values of vowel-specific formants, comparisons show that sounds at higher fundamental frequencies may in some cases exhibit frequencies and relative amplitude maxima of harmonics that correspond to the statistical formant frequencies for the vowels in question, whereas the frequencies of the harmonics of sounds at lower fundamental frequencies lie in between these formant frequencies. For the latter, the formants are subsequently expected to appear as envelope peaks either only indistinctly or not at all, and the corresponding vowel perception is expected to be impaired when compared to sounds at higher fundamental frequencies for which the frequencies of the harmonics match statistical vowel-specific formant frequencies.

Such reasoning leads to the assumption that there is not only a general but also a discontinuous relationship between the intelligibility of vowel sounds and their fundamental frequency: accordingly, vowel sounds at lower fundamental frequencies would, as a rule, be more intelligible than vowel sounds at higher frequencies, but vowel intelligibility would also depend upon the respective relationships between fundamental frequency, harmonic spectrum and vowel-specific formant patterns (as given in formant statistics).

In line with prevailing theory, the relationship between fundamental­ frequency, harmonic spectrum, spectral envelope and expected vowel-­specific formant pattern leads to the same assumption that the “grade” of vowels differs in relation to vowel-specific acoustic characteristics.

However, as explained, everyday experience refutes such a generalised conclusion. Thus, a theory of vowels as elements of language that formulates an inherently qualitative and at the same time discontin­uous relationship between fundamental frequency and vowel perception stands in contrast with the—possibly “sensational”—characteristic of a voiced element of language being independent of pitch within the range of intelligible speech. ← 37 | 38 →