Show Less
Open access

Acoustics of the Vowel

Preliminaries

Dieter Maurer

It seems as if the fundamentals of how we produce vowels and how they are acoustically represented have been clarified: we phonate and articulate. Using our vocal chords, we produce a vocal sound or noise which is then shaped into a specific vowel sound by the resonances of the pharyngeal, oral, and nasal cavities, that is, the vocal tract. Accordingly, the acoustic description of vowels relates to vowelspecific patterns of relative energy maxima in the sound spectra, known as patterns of formants.
The intellectual and empirical reasoning presented in this treatise, however, gives rise to scepticism with respect to this understanding of the sound of the vowel. The reflections and materials presented provide reason to argue that, up to now, a comprehensible theory of the acoustics of the voice and of voiced speech sounds is lacking, and consequently, no satisfying understanding of vowels as an achievement and particular formal accomplishment of the voice exists. Thus, the question of the acoustics of the vowel – and with it the question of the acoustics of the voice itself – proves to be an unresolved fundamental problem.
Show Summary Details
Open access

3 Vowels and Number of Formants

3    Vowels and Number of Formants

3.1    Inconstant Number of Vowel-Specific Relative Spectral Energy Maxima in Sounds of Back Vowels and of /a–α /

As reported in the literature, when analysing samples of sounds of back vowels and of /a–α /, some sounds may exhibit only one distinct vowel-specific spectral envelope peak, whereas other sounds of the same vowels exhibit the expected two pronounced peaks.


Empirically, the number of vowel-specific relative spectral energy maxima proves to be inconstant for sounds of single vowels.


3.2    Inconstant Correspondence between Vowel-Specific Relative Spectral Energy Maxima and Calculated Vowel-­Specific Formant Patterns

If sounds of back vowels and of /a–α / exhibit only a single vowel-specific spectral envelope peak, according to the literature, formant analy­sis (e.g. using LPC analysis) often reveals two close formant frequencies. Such cases are therefore referred to as formant merging. It follows that, for the sounds in question, the spectral envelope peak and the calculated first two formants do not correspond to one another.

Yet, if sounds of back vowels and of /a–α / exhibit two vowel-specific spectral envelope peaks, such a correspondence is generally found.

Thus, the observation of an inconstant number of vowel-specific spectral envelope peaks of sounds of one and the same vowel calls into question the fundamental relationship between spectral envelopes and calculated formants.


No direct parallelism exists between relative spectral energy maxima and calculated formants.


Consequently, formants prove to be constructs of a specific method of analysis (see Section 6.1). ← 32 | 33 →

3.3    Inconstant Number of Vowel-Specific Relative Spectral Energy Maxima and of Calculated Vowel-Specific Formants

As shown in Part I, with regard to high front vowels and r-coloured front vowels of some languages, sounds belonging to these vowels can exhibit, in part, similar first and second lower spectral envelope peaks and formant analysis can reveal similar F1–F2. Thus, the sounds of the corresponding vowels are physically distinct only with regard to the third spectral envelope peak and the third formant, respectively.

For such languages, it follows that back vowels, as well as some of the front vowels, are physically describable in terms of different patterns of F1–F2, whereas the remaining front vowels have to be described only in terms of different patterns of F1–F2–F3.


Empirically, the number of vowel-specific relative spectral energy maxima and of calculated vowel-specific formants proves to be inconstant among different vowels.


With regard to spectral envelope peaks, then, the quality of some sounds of back vowels is represented by a single peak, the quality of other sounds of back vowels and sounds of some front vowels by two peaks and the quality of some front vowels by three peaks.

3.4    Addition: “Spurious” Formants

In the spectra of the sounds of certain speakers, an additional spectral envelope peak may occur between the expected first and second or second and third formant. According to the prevailing methodological rules for determining formants, this maximum is not interpreted as vowel specific but as a specific characteristic of the speaker’s voice in question. Therefore, it is referred to as a “spurious” formant.

Such “spurious” spectral envelope peaks also need to be considered within the context of the inconstant number of vowel-specific spectral envelope peaks.

3.5    Addition: “Flat” Vowel Spectra

In the literature, some indications for possible vowel perception related to “flat” spectral parts, lacking any clearly distinctive relative energy maxima, are also given. ← 33 | 34 →

3.6    Addition: Inconstant Number of Vowel-Specific Formants in Synthesis

Synthetically produced—and easily recognisable—vowel sounds can be generated for most vowel qualities using three- and two-formant synthesis. For certain vowels, in particular for back vowels and /a–α /, this is also possible by way of a one-formant synthesis.

With regard to synthesised sounds perceived as belonging to one vowel quality, a comparison of the sounds with F1’–F2’ (two-formant synthesis) and the sounds with F1’–F2’–F3’ (three-formant synthesis) reveals differences for F2’, in particular for sounds of front vowels. Similarly, a comparison of the sounds with F1’ (one-formant synthesis) and the sounds with F1’–F2’ (two-formant synthesis) reveals differences for F1’. (However, in the corresponding comparative studies, the fundamental frequency used in synthesis the was not varied systematically.)

Synthesis thus confirms the inconstant number of observable vowel-­specific formants. Further, synthesis involving different numbers of formants (different numbers of filters) indicates differences for F1’ or F2’, respectively, although the sounds in question are perceived as belonging to the same vowel. ← 34 | 35 →