Show Less
Open access

Acoustics of the Vowel

Preliminaries

Dieter Maurer

It seems as if the fundamentals of how we produce vowels and how they are acoustically represented have been clarified: we phonate and articulate. Using our vocal chords, we produce a vocal sound or noise which is then shaped into a specific vowel sound by the resonances of the pharyngeal, oral, and nasal cavities, that is, the vocal tract. Accordingly, the acoustic description of vowels relates to vowelspecific patterns of relative energy maxima in the sound spectra, known as patterns of formants.
The intellectual and empirical reasoning presented in this treatise, however, gives rise to scepticism with respect to this understanding of the sound of the vowel. The reflections and materials presented provide reason to argue that, up to now, a comprehensible theory of the acoustics of the voice and of voiced speech sounds is lacking, and consequently, no satisfying understanding of vowels as an achievement and particular formal accomplishment of the voice exists. Thus, the question of the acoustics of the vowel – and with it the question of the acoustics of the voice itself – proves to be an unresolved fundamental problem.
Show Summary Details
Open access

List of Figures

List of Figures

1            Prevailing Theory

1.10       Illustration

2            Prevailing Empirical References

2.1         General References

M7         Unsystematic Correspondence between Vowels, Patterns of Relative Spectral Energy Maxima and Formant Patterns

M7.1      Inconstant Number of Vowel-Specific Relative Spectral Energy Maxima and Incongruence of Vowel-Specific Formant Patterns

Figures 1 to 3. Sounds of /a–α, o, u/ which exhibit only one relative spectral energy maximum within their vowel-specific frequency range ≤ c. 1.5 kHz.

Figure 1. Sounds produced by children.

Figure 2. Sounds produced by women.

Figure 3. Sounds produced by men.

Figures 4 to 6. Sounds of /a–α, o, u / which exhibit two relative spectral energy maxima within their vowel-specific frequency range ≤ c. 1.5 kHz.

Figure 4. Sounds produced by children.

Figure 5. Sounds produced by women.

Figure 6. Sounds produced by men.

Figure 7. Direct comparisons of sounds of back vowels with one or two relative spectral energy maxima ≤ c. 1.5 kHz. ← 268 | 269 →

Figure 8 and 9. Sound pairs of / i / and of /e /, each pair produced by speakers of one and the same age- and gender-related speaker group, with small differences in F0 and F1 but substantial differences in the higher vowel-related spectral range.

Figure 8. Sounds pairs of / i /.

Figure 9. Sounds pairs of /e /.

Figure 10. A sound pair of / i / and a corresponding pair of /e /, each pair comparing productions of a man and a child, with small differences in F0 and F1 but very pronounced differences in the higher vowel-related spectral ranges.

M7.2      Partial Lack of Manifestation of Vowel-Specific Relative Spectral Energy Maxima

Figure 11 to 12. Sounds of /a–α, o /, produced by children, women and men, which exhibit “flat” or “sloping” lower spectral portions < c. 1.5 kHz lacking a clearly determinable vowel-related peak.

Figure 11. Sounds of /a–α /.

Figure 12. Sounds of /o /.

Figure 13 to 14. Sounds of / i, e /, produced by children, women and men, which exhibit “flat” or “sloping” spectral portions in the frequency range of 1.5–5 kHz lacking a clearly determinable pattern of vowel-related peaks.

Figure 13. Sounds of / i /.

Figure 14. Sounds of /e /.

M8         Lack of Correspondence between Vowels and Patterns of Relative Spectral Energy Maxima or Formant Patterns

M8.1      Dependence of Vowel-Specific, Relative Spectral Energy Maxima and Lower Formants ≤ 1.5 kHz on Fundamental Frequency

M8.2      Vowel Perception at Fundamental Frequencies above Statistical Values of the Respective First Formant Frequency

M8.3      “Inversions” of Relative Spectral Energy Maxima and Minima and “Inverse” Formant Patterns in Sounds of Individual Vowels

Figure 11 to 13. Sounds of /a–α, o, u /, produced at different F0 by children, women and men, which exhibit “inverse” relative spectral maxima and minima in terms of “inverse” spectral envelope curves ≤ 1.5 kHz.

Figure 11. Sounds of /a–α /.

Figure 12. Sounds of /o /.

Figure 13. Sounds of /u /. ← 270 | 271 →

M9         Ambiguous Correspondence between Vowels and Patterns of Relative Spectral Energy Maxima or Formant Patterns or Complete Spectral Envelopes

M9.1      Ambiguous Patterns of Relative Spectral Energy Maxima and Ambiguous Formant Patterns

Figures 1 to 3. Comparisons of sounds of back vowels and of /a–α / produced by different speakers, and related model patterns of spectral peaks and/or of calculated formant frequencies.

Figure 1. Sounds of /a–α, o, u /; related model pattern = 600–1200 Hz.

Figure 2. Sounds of /a–α, o, u /; related model pattern = 600–1050 Hz.

Figure 3. Sounds of /a–α, o, u /; related model pattern = 660–1320 Hz.

Figures 4 to 8. Comparisons of sounds of back vowels and of /a–α / produced by single speakers, and related model patterns of spectral peaks and/or of calculated formant frequencies.

Figure 4. Three comparisons of sounds of /a–α, o, u / produced by a man and two women; related model pattern = 600–1200 Hz.

Figure 5. Two comparisons of sounds of /a–α, o / produced by a man (sounds sung by a tenor); related model pattern = 600–1200 Hz for the first comparison; similar spectral peaks and spectral envelopes for the second comparison.

Figure 6. Sounds of /a–α / and of /u /, produced by a woman, which exhibit comparable spectral envelopes < 1.5 kHz.

Figure 7. Sounds of /ɔ, o, u / produced by a woman; related model = one clear peak at c. 550 Hz.

Figure 8. Two comparisons of sounds of /o, u / produced by two children (age 12 and 6); related model patterns = one clear peak at c. 400 Hz (first sound pair) and at c. 520 Hz (second sound pair), respectively.

Figures 9 to 18. Comparisons of sounds of front vowels produced by different speakers, and related model patterns of spectral peaks and/or of calculated formant frequencies.

Figure 9. Sounds of /ø, e, y, i /; related model pattern = 330–2000 Hz.

Figure 10. Sounds of /ø, e, y, i /; related model pattern = 350–2150 Hz.

Figure 11. Sounds of /ø, e, y, i /; related model pattern = 420–2150 Hz. ← 271 | 272 →

Figure 12. Sounds of /ε, e, i /; related model pattern = 500–2250 Hz.

Figure 13. Sounds of /ε, e, i /; related model pattern = 600–2450 Hz.

Figure 14. Sounds of /e, i /; related model pattern = 400–2600 Hz.

Figure 15. Sounds of /ε, e, y/; related model pattern = 500–2000 Hz.

Figure 16. Sounds of /ε, ø, y/; related model pattern = 430–2000 Hz.

Figure 17. Sounds of /ε, ø, y/; related model pattern = 475–1900 Hz.

Figure 18. Sounds of /ε, y/; related model pattern = 650–1950 Hz.

Figures 19 to 21. Comparisons of sounds of front vowels produced by single speakers, and related model patterns of spectral peaks and/or of calculated formant frequencies.

Figure 19. Two comparisons of sounds of /ε, e, i / produced by two women; related model patterns = 510–2550 Hz and 600–2400 Hz, respectively.

Figure 20. Three comparisons of sounds of /e, i / produced by three children (age range 7 to 9); related model patterns = 450–3000 Hz and 400–3000 Hz, respectively.

Figure 21. Three comparisons of sounds of /ø, y/ produced by a man, a woman and a child (age 12); related model patterns = 320–1600 Hz, 320–2000 Hz, and 400–2000 Hz, respectively.

M9.3      Ambiguity and Individual Vowels

M10       Lack of Correspondence between Patterns of Relative Spectral Energy Maxima or Formant Patterns and Age- and Gender-Related Speaker Groups or Vocal-Tract Sizes

M10.1    Similar Patterns of Relative Spectral Maxima and Similar Formant Patterns ≤ 1.5 kHz for Different Age- and Gender-Related Speaker Groups or Vocal-Tract Sizes

Figure 1 to 6. Comparisons of sounds produced by single children, women and men at comparable levels of F0.

Figure 1. Sounds of /o /.

Figure 2. Sounds of /e /.

Figure 3. Sounds of /u /.

Figure 4. Sounds of / i /.

Figure 5. Sounds of /a /.

Figure 6. Sounds of /o /, including vocalisations of a professional opera singer (baritone).

Figure 7 to 10. “Inverted” age- or size-related differences in vowel-related lower spectral peak(s) and calculated F1 (and F2) for sounds produced by single children and men.

Figure 7. Sounds of /o /.

Figure 8. Sounds of /e /.

Figure 9. Sounds of /u /.

Figure 10. Sounds of / i /.

M10.A   Addition:

Vowel Imitations by Birds

Figure 11 to 16. Vowel sounds in word context imitated by mynah birds.

Figure 11. Sounds of / i /.

Figure 12. Sounds of /e /.

Figure 13. Sounds of /a–α /.

Figure 14. Sounds of /o /.

Figure 15. Sounds of /u /. ← 273 | 274 →