Show Less
Open access

Acoustics of the Vowel

Preliminaries

Dieter Maurer

It seems as if the fundamentals of how we produce vowels and how they are acoustically represented have been clarified: we phonate and articulate. Using our vocal chords, we produce a vocal sound or noise which is then shaped into a specific vowel sound by the resonances of the pharyngeal, oral, and nasal cavities, that is, the vocal tract. Accordingly, the acoustic description of vowels relates to vowelspecific patterns of relative energy maxima in the sound spectra, known as patterns of formants.
The intellectual and empirical reasoning presented in this treatise, however, gives rise to scepticism with respect to this understanding of the sound of the vowel. The reflections and materials presented provide reason to argue that, up to now, a comprehensible theory of the acoustics of the voice and of voiced speech sounds is lacking, and consequently, no satisfying understanding of vowels as an achievement and particular formal accomplishment of the voice exists. Thus, the question of the acoustics of the vowel – and with it the question of the acoustics of the voice itself – proves to be an unresolved fundamental problem.
Show Summary Details
Open access

7 Unsystematic Correspondence between Vowels, Patterns of Relative Spectral Energy Maxima and Formant Patterns

7    Unsystematic Correspondence between Vowels, Patterns of Relative Spectral Energy Maxima and Formant Patterns

7.1    Inconstant Number of Vowel-Specific Relative Spectral Energy Maxima and Incongruence of Vowel-Specific Formant Patterns

As discussed in Section 3.1, sounds of back vowels and of /a–α / can exhibit only one relative spectral energy maximum within their vowel-specific frequency range ≤ 1.5 kHz (≤ 2 kHz for some sounds of /a /), in contrast to other sounds of the same vowels, which have two such maxima. Consequently, the number of vowel-specific energy maxima is inconstant.

The spectral envelopes and formant patterns of such vowel sounds cannot in all cases be interpreted as “formant merging”: examples of sound pairs of back vowels can be observed for which both sounds exhibit the lowest spectral envelope peak at a similar frequency level, but only one of them has a pronounced second envelope peak within the frequency range mentioned. Then, the first spectral envelope peak of both sounds corresponds to the vowel quality in question, whereas the second spectral envelope peak may be linked to an additional “colouring” of that sound. However, it plays a marginal role in vowel perception and, in such a case, does not posses vowel-differentiating value.

For both sounds of such sound pairs, formant analyses using current methods may reveal two lower formants. However, calculating F2 for the first sound of the respective sound pair mentioned, exhibiting only one lower spectral envelope peak, may prove highly contingent on the number of filters chosen, above all for sounds of children. In addition, its amplitude can be very low and its bandwidth can be very large, that is, far beyond reference values as given in the literature.

With regard to front vowels, the frequency of observable second envelope peaks, and with them also calculated F2, can vary strongly. Because of this, there are examples of sound pairs of front vowels for which the second envelope peak and calculated F2 of one sound approaches or even exceeds the third envelope peak and calculated F3 of the other sound. (Such observations in general relate to sounds of speakers of different speaker groups, which are produced at similar fundamental frequencies. However, this can also be observed for the sounds of speakers of the same speaker group.) ← 56 | 57 →

Thus, it is not possible to designate a standard number of consecutive relative spectral energy maxima related to delimited frequency ranges that represent any given vowel. The same holds true for formants, although it is less obvious. There are also formant patterns of sounds of single vowels whose reciprocal correspondence of single formants is open to discussion.


The number of vowel-specific relative spectral energy maxima is inconstant, and formant patterns are incongruent in some cases.


7.2    Partial Lack of Manifestation of Vowel-­Specific Relative Spectral Energy Maxima

In their vowel-specific range of the spectrum ≤ 1.5 kHz, sounds of back vowels and of /a–α / produced at fundamental frequencies ≤ 350 Hz can exhibit series of harmonics with consistent, quasi-identical amplitudes. These vowel-specific parts of harmonic spectra seem to be “flat”, lacking any clearly distinctive relative energy maxima. Of special interest in this respect are the sounds of /a, α, ɔ, o / in cases where the amplitudes of the first three to five harmonics are not markedly different.

In their vowel-specific range of the spectrum ≥ 1.5 kHz, sounds of front vowels produced at fundamental frequencies ≤ 350 Hz can also exhibit series of harmonics with consistent, quasi-identical amplitudes. Thus, what applies to back vowels and to /a–α / for their entire vowel-specific frequency range also applies to front vowels for the higher part of their vowel-specific frequency range.

In addition, cases of such vowel-specific, “flat” spectral portions also exist for sounds produced at fundamental frequencies > 350 kHz, even if, in relation to the large frequency spacing of the harmonics, this generally remains limited to the sounds of the vowels / i, e, ε, a, α /. For certain fundamental frequencies of the sounds of /ɔ, o /, the first two harmonics can exhibit equal amplitudes.

Also worth mentioning in this context are the sounds of back vowels and of /a–α /, which exhibit continuously decreasing amplitudes in the vowel-specific lower frequency range. In the spectra of these sounds, the first harmonic generally forms the actual spectral maximum.

Thus, the set of problems concerning a formulation of a general relationship between the perceived vowel quality and its physical representation based on a certain number of relative spectral energy maxima is again extended. ← 57 | 58 →


Spectral envelope maxima, as described in the literature, are not a precondition for the physical representation of vowels.


The relationship between “flat”, vowel-specific parts of sound spectra and calculated formant frequencies using current methods of analysis cannot be described in simple and general terms. The same holds true for the relationship between continuously decreasing amplitudes of the harmonics in the vowel-specific lower frequency range and calculated formant patterns. Therefore, the issue is left open to discussion here. However, it has to be considered as an additional methodological problem of formant analysis.

7.3    Addition: Resynthesis and Synthesis

Inconstancy in the number of vowel-specific relative spectral energy maxima, possible incongruence of formant patterns and vowel sounds with “flat” or decreasing vowel-specific spectrum portions can be replicated using resynthesis.

The same also applies to formant patterns or harmonic spectra not derived directly from natural vowel sounds. ← 58 | 59 →