Show Less
Open access

Acoustics of the Vowel

Preliminaries

Dieter Maurer

It seems as if the fundamentals of how we produce vowels and how they are acoustically represented have been clarified: we phonate and articulate. Using our vocal chords, we produce a vocal sound or noise which is then shaped into a specific vowel sound by the resonances of the pharyngeal, oral, and nasal cavities, that is, the vocal tract. Accordingly, the acoustic description of vowels relates to vowelspecific patterns of relative energy maxima in the sound spectra, known as patterns of formants.
The intellectual and empirical reasoning presented in this treatise, however, gives rise to scepticism with respect to this understanding of the sound of the vowel. The reflections and materials presented provide reason to argue that, up to now, a comprehensible theory of the acoustics of the voice and of voiced speech sounds is lacking, and consequently, no satisfying understanding of vowels as an achievement and particular formal accomplishment of the voice exists. Thus, the question of the acoustics of the vowel – and with it the question of the acoustics of the voice itself – proves to be an unresolved fundamental problem.
Show Summary Details
Open access

Experiments

Experiments

The treatise concludes with a list of possible experiments that allow for empirical exploration of the problems discussed here under laboratory conditions. ← 251 | 252 →

E1     Number of Relative Spectral Energy Maxima and Number of Formants

E1.1    Sounds of Back Vowels Showing only One Lower Spectral Peak ≤ 1.5 kHz

To do: ( i ) Find examples of sounds of back vowels, produced as voiced sounds in isolation, which show only one spectral peak ≤ 1.5 kHz. ( ii ) Perform a listening test.

Note: For most of the corresponding examples, LPC analysis yields two formants ≤ 1.5 kHz; however, you will find that the second formant is often weak (large second formant bandwidth, low second formant level). You also will find examples for which LPC analysis yields only one lower formant frequency. (Long vowels produced in some languages, such as Standard German, are particularly suited for such an experiment.)

Option: You may also perform resynthesis and perform a related second listening test.

Thesis: You will find many examples for which the vowel identification score is high.

Examples: See Section M7.1, Figures 1 to 3.

E1.2    Sounds of Back Vowels Showing only One Pronounced Lower Formant ≤ 1.5 kHz

To do: ( i ) From the sample investigated in the previous experiment, select examples of sounds of back vowels for which LPC analysis gives a weak second formant (high bandwidth, low level). ( ii ) Manipulate these sounds in terms of shaping the spectrum using bandpass filtering including filter slope variation, until LPC analysis gives only one formant ≤ 1.5 kHz. ( iii ) Perform a listening test.

Thesis: You will find examples for which the perceived vowel quality proves to be maintained for the manipulated sounds. ← 252 | 253 →

E1.3    Sounds of Single Front Vowels Showing Non-Corresponding F2 and F3

To do: ( i ) Find examples of sound pairs of the same intended front vowel, produced as voiced sounds in isolation at similar F0, for which F2 of the first sound is near or above F3 of the second sound. ( ii ) Perform a listening test.

Option: You may compare sounds produced by speakers of the same age and gender group as well as of different groups. You may also perform resynthesis, and perform a related second listening test. You may also investigate the roles of the higher formants in bandpass filtering single formants.

Thesis: You will find such examples of sound pairs equal in perceived vowel quality.

Examples: See Section M7.1, Figures 8 to 10.

E1.4    Sounds of Back Vowels Showing No Pronounced Spectral Peak ≤ 1.5 kHz

To do: ( i ) Find examples of sounds of back vowels, produced as voiced sounds in isolation, which show no pronounced spectral peak ≤ 1.5 kHz apart from the fundamental (“flat” spectra, or spectra exhibiting continuously decreasing amplitudes of the harmonics). ( ii ) Perform a listening test.

Thesis: You will find examples for which the score of vowel identification is high. Further, you may experience examples for which the calculation of F1–F2 depends on rather small amplitude variations of the first harmonics.

Examples: See Section M7.2, Figures 11 and 12.

E1.5    Sounds of Front Vowels Showing No Pronounced Spectral Peak > 2 kHz

To do: ( i ) Find examples of sounds of front vowels, produced as voiced sounds in isolation, which show no pronounced spectral peak > 2 kHz. ( ii ) Perform a listening test.

Thesis: You will find examples for which the vowel identification score is high.

Examples: See Section M7.2, Figures 13 and 14. ← 253 | 254 →

E2     Patterns of Relative Spectral Energy Maxima, Formant Patterns and Fundamental Frequency

E2.1    Sounds of Single Vowels Produced at Different F0 Exhibiting Different Spectral Peaks and Different Calculated Formant Patterns: Part 1, Dependence of Formant Patterns on F0

To do: ( i ) Select speakers with excellent vocal abilities. ( ii ) Investigate all long vowels of the language in question. ( iii ) Let the speakers produce single words (including word pairs forming minimal pairs), single syllables (including logatomes) and isolated vowel sounds for their entire range of F0 of possible vowel production. (iv) Perform a listening test. (v) Only select sounds with a high identification score. (vi) Perform spectral analysis and LPC analysis.

Options: You may need to train the speakers so as they indeed maintain the perceived vowel while altering F0. You may select professional singers, actresses and actors. You may give special attention to the entertainment sector, including voice-over. You may vary vocal effort. You may include resynthesis. You may also extract words or syllables or vowel nuclei from existing recordings.

Thesis: ( i ) You will obtain unsystematic results, above all depending on single speakers, F0 levels and vocal effort, frequency ranges of spectral peaks and formants, vowel qualities and additional spectral characteristics of the original sounds. ( ii ) However, for F0 > 200, the spectral peaks and the calculated lower formants will shift with raising F0 for a substantial part of your sample even if the perceived vowel quality remains the same. ( iii ) Whether or not you experience a systematic (and not speaker-related) impact of the syntactic or semantic context of the vowel sounds is left open here.

Examples: See Section M8.1, Figures 1 to 5. ← 254 | 255 →

E2.2    Sounds of Single Vowels Produced at Different F0 Exhibiting Different Spectral Peaks and Different Calculated Formant Patterns: Part 2, Vowel Intelligibility for Sounds at F0 > 500 Hz

To do: ( i ) Refer to the sounds of the previous experiment. ( ii ) Select the sounds at F0 > 500 Hz.

Thesis: ( i ) You will obtain different results related to the abilities and production styles or habits of the speakers. ( ii ) However, you will observe possible vowel perception up to F0 corresponding to the upper frequency limit of F1 for men and women as given in formant statistics.

Examples: See Section M8.1, Figures 2 and 3, and Section M8.2, Figures 6 and 7; see also the pitch contours in Section M8.2, Figures 8 to 10.

E2.3    Sounds of Single Vowels Produced at Different F0 Exhibiting Different Spectral Peaks and Different Calculated Formant Patterns: Part 3, Resynthesising a Formant Pattern at Different F0

To do: ( i ) Refer to the sounds experiment E2.1. ( ii ) Select two sounds of one vowel exhibiting very different F0 and different spectral peaks or (lower) formants, respectively. ( iii ) Concatenate these two sounds and insert a pause between them. Eventually, equalise loudness. (iv) Perform resynthesis of the concatenated sound, applying three conditions for F0. Firstly, use F0 of the original sounds; secondly, fix F0 to the value of the original sound at lower F0; thirdly, fix F0 to the value of the original sound at higher F0. (v) Perform a listening test including all sound pairs.

Options: Instead of concatenating two sounds, a singer or speaker with high vocal ability may perform a glissando, and resynthesis is performed at original (altering) F0, fixed F0 corresponding to the lowest, and fixed F0 corresponding to the highest F0 values of the original sound. However, during the production of the glissando, the vowel quality must be strictly maintained.

Thesis: ( i ) You will obtain unsystematic results (see above). ( ii ) However, you will find many cases for which the original sounds of a pair as well as the resynthesised sounds, for which the first condition mentioned applies, are perceived as the same vowel, but the resynthesis applying the second and third condition produces a change in vowel perception between the two sounds of a pair. ← 255 | 256 →

E2.4    Sounds of Single Back Vowels Produced at Different F0 Exhibiting Inverse Spectral Peaks

To do: Refer to experiment E2.1 but, in particular, consider sound pairs of a back vowel which differ in F0 and exhibit an “inversion” of spectral peaks, that is, the first relative spectral energy maximum (corresponding to its F1) for the sound at higher F0 is found at a frequency level of a relative spectral minimum for the sound at lower F0, in between the first and second spectral peak (in between the F1 and F2) of the latter. Consider also resynthesis and identification scores.

Thesis: You will find many cases for which the sounds of such pairs are perceived as the same vowel.

Examples: See Section M8.3, Figures 11 to 13.

E2.5    Special Note Concerning Inconstant Numerical Relationship between Calculated F0 and Formant Patterns

To do: ( i ) Refer to sounds at very different F0, above all to sounds of the vowel /e, o /. Include sounds produced with different vocal effort. ( ii ) Perform a listening test. ( iii ) Select only sounds with a high identification score. (iv) Calculate formant patterns for these sounds. (v) Perform resynthesis. (vi) Perform a listening test with the resynthesised sounds.

Thesis: ( i ) You may observe sound pairs for which F1 or F1–F2 of the sound at lower F0 is higher than F1 or F1–F2 at higher F0, thus seemingly indicating an “inverse” dependence of lower formants and F0. ( ii ) You may also note that resynthesis seems to confirm this observation.

( iii ) However, you will have to relate such observations to a limited frequency range of F0, differences in vocal effort may have a strong influence on formant estimation and you will have to consider methodological aspects of LPC analysis.

Examples: See Section M8.1, Figure 4 for an indication. ← 256 | 257 →

E3     Formant Pattern Ambiguity

E3.1    Formant Pattern Ambiguity in Natural Vocalisations

To do: ( i ) Select speakers from all three speaker groups with excellent vocal abilities. ( ii ) Let them produce isolated sounds of long vowels at very different F0. Vary vocal effort, for example medium, low and high vocal effort. Investigate a frequency range of F0 of 220 to 700 Hz for children, 175 to 880 Hz for women and 110 to 523 Hz for men. Investigate different F0 step by step (you may refer to a musical scale). ( iii ) Perform spectral analysis and formant pattern analysis. With regard to the latter, you may perform the analysis also for F0 > 350 Hz even if there is a lack of methodological substantiation. (iv) Perform a listening test.

Thesis: ( i ) You will find unsystematic results (see above). ( ii ) However, comparing the vowel-related patterns of spectral peaks and of formants of the sounds of a single speaker, you will find many examples of similar patterns for sounds at different F0 and two different perceived vowel qualities. You may even encounter examples of such patterns for three vowels. ( iii ) The same holds true in an extended way for a corresponding comparison of the sounds of different speakers. (iv) You will not be able, in general terms, to directly relate such a pattern ambiguity to differences in speaker group or vocal effort.

Examples: See Section M9.1, Figures 1 to 21.

E3.2    Formant Pattern Ambiguity in Model Synthesis

To do: ( i ) Refer to the sounds in experiment E3.1. ( ii ) Select sounds of different vowels for which—apart from differences in F0 and the frequency distance of the harmonics—a direct comparison of the vowel-related spectral region as well as the corresponding spectral peaks and formant patterns can be considered similar, according to prevailing consideration in phonetics. ( iii ) Use the related formant patterns (including formant bandwidths) as models for vowel synthesis. (iv) Perform vowel synthesis for the entire range of the F0 you have investigated in the previous experiment. (v) Perform a listening test.

Thesis: You will observe that, for selected formant patterns of natural vocalisations that prove to be ambiguous in vowel representation, the alteration of F0 in such a model synthesis generally produces a clear and sometimes very pronounced change in perceived vowel quality. ← 257 | 258 →

E4     Patterns of Relative Spectral Energy Maxima, Formant Patterns and Age- and Gender-Related Vocal-Tract Sizes

E4.1    Comparison of Vowel-Specific Spectral Characteristics of Children, Women and Men Related to Different and Similar F0 of Vocalisations: Part 1, Natural Vocalisations

To do: ( i ) Select a child, a woman and a man with excellent vocal abilities. ( ii ) Let them produce isolated sounds of long vowels at different F0 according to the C-major scale, for example starting from 220 Hz for the child, from 175 Hz for the woman and from 131 Hz for the man. Investigate a range of F0 up to 523 Hz. Ensure that the sounds correspond with each other perceptually, not only in vowel quality but also in “vowel-colour” variant, which makes for the greatest possible correspondence as regards perception (exclusion of age- and gender-related “dialects”). ( iii ) Perform a listening test. (iv) Perform spectral analysis and compare the spectra and the spectral peaks of the sounds of a singe vowel. (iv) In parallel, perform formant analysis and compare the formant patterns of the sounds of a single vowel.

Option: You may proceed in a similar way with several speakers from the three speaker groups as to re-examine formant statistics. However, you will not be able to control the correspondences of the vowel qualities as precisely as in an investigation of the utterances of three single speakers.

Thesis: ( i ) With regard to the spectral characteristics in general and the spectral peaks in particular, you will find the expected differences which are in line with the numbers given in formant statistics for citation-form words, if the F0 of the sounds also concurs with the F0 of the statistics in question, that is, c. 262 Hz for the child, c. 220 Hz for the woman and c. 131 Hz for the man (levels given according to the C-major scale). ( ii ) However, you will observe that spectral differences ≤ 1.5 kHz decrease or disappear if the speakers vocalise at a similar F0. ( iii ) You will even observe cases of “inversions” of expected age- and gender-related spectral differences in terms of higher spectral peaks ≤ 1.5 kHz for the sounds of the two adults than for the sounds of the child, if the F0 of the former are also higher than of the latter. The same will hold true for the comparison of sounds of the man with sounds of the woman at correspondingly different F0. (iv) With regard to calculated formant patterns, you will observe similar behaviour. How ← 258 | 259 → ever, methodological problems of analysis will interfere. (v) With regard to formant statistics, you will not be able to resolve the methodological problem of formant pattern analysis at F0 > 350 Hz. Moreover, you will have to consider possible age- and gender-related vowel colouring (age- and gender-related “dialects”). However, for sounds > 220 Hz, you will no longer find a clear indication of generalised age- and gender-related formant patterns < 1.5 kHz, if the F0 of the sounds correspond.

Examples: See Section M10.1, Figures 1 to 10.

E4.2    Comparison of Vowel-Specific Spectral Characteristics of Children, Women and Men Related to Different and Similar F0 of Vocalisations: Part 2, Resynthesis

To do: ( i ) Select the sounds of the three single speakers of the previous experiment. ( ii ) Resynthesise them on the basis of formant analysis but, for each single formant pattern of a single vocalisation, perform resynthesis for all F0 levels on which the speaker produced vowel sounds. ( iii ) Perform a listening test.

Thesis: ( i ) If resynthesis is performed applying F0 and formant patterns of the original sounds, in general, the perceived vowel quality will not change. ( ii ) If only the formant patterns correspond to the original sounds but F0 is varied according to the F0-range of the natural sounds, you will obtain unsystematic results (see above). However, for some of the vowels investigated and for F0 of the sounds > 200 Hz, for all three speakers, you will find many examples of sounds for which the perceived vowel quality changes with changing F0. ← 259 | 260 →

E5     Patterns of Relative Spectral Energy Maxima, Formant Patterns and Phonation Types

E5.1    Whispered Sounds Compared with Voiced Sounds at Different F0 in Utterances of a Single Speaker

To do: ( i ) Select a speaker with good vocal abilities. ( ii ) Let the speaker produce isolated whispered sounds of the long vowels of his language. ( iii ) Then, let the speaker produce voiced sounds of these vowels at different levels of F0. (You may refer to a musical scale). Investigate a range of F0 up to 523 Hz in minimum. (You may refer to utterances of a woman.) Pay attention to the close correspondence of the produced vowel qualities and vowel colours. (iv) Perform spectral analysis and formant analysis. (v) Perform resynthesis, according to the following conditions: for a given formant pattern of a single sound produced, as source characteristic, apply all F0 investigated as well as noise. (vi) Perform a listening test.

Thesis: ( i ) You will find unsystematic results (see above). ( ii ) When comparing whispered sounds with voiced sounds at lower F0, in many cases, you will find indications of higher spectral peaks ≤ 1.5 kHz and higher frequencies of calculated F1 and F2 for the former than the latter, as is indicated in formant statistics for citation-form words. ( iii ) However, you will also find many cases in which such differences decrease or even disappear if the F0 of the voiced vowel sound is raised. (iv) In parallel, often, no change in vowel perception will be found for a resynthesis using formant patterns of whispered sounds but higher F0 of voiced sounds. (v) In parallel, as metioned above, a change in vowel perception will often be found for resynthesising formant patterns of voiced sounds with regard to all F0 investigated. ← 260 | 261 →

E5.2    Whispered Sounds Compared with Voiced Sounds at Different F0 in Utterances of Speakers of Different Speaker Groups

To do: Redo the previous experiment for three speakers, a child, a woman and a man. (You may refer to the three speakers and the sounds of experiment E4.1.)

Thesis: In addition to the results predicted for experiment E4.1, you can question the so-called speaker group differences. Above all, for a given vowel, you may find correspondences of formant patterns of a voiced sound of a child when compared to a whispered sound of an adult, and vice versa.

E5.3    Sounds of Back Vowels Showing Three Spectral Peaks ≤ 1.5 kHz

To do: ( i ) Search for examples of sounds of back vowels, produced as whispered sounds in isolation, which show three spectral peaks ≤ 1.5 kHz. Also search for correspondingly produced examples that only show two peaks ≤ 1.5 kHz. ( ii ) Perform a listening test.

Thesis: You will find examples of the first kind for which the identification score is as high as for the examples of the second kind. ← 261 | 262 →

E5.4    Sounds of Front Vowels Showing Two Spectral Peaks ≤ 1.5 kHz

To do: ( i ) Search for examples of sounds of front vowels, produced as whispered sounds in isolation, which show two spectral peaks ≤ 1.5 kHz. Also search for correspondingly produced examples that show only one peak ≤ 1.5 kHz. ( ii ) Perform a listening test.

Thesis: You will find examples of the first kind for which the identification score is as high as for the examples of the second kind.

E6     Patterns of Relative Spectral Energy Maxima, Formant Patterns and Vowel Imitation by Birds

E6.1    Direct Comparisons of Selected Sounds of Humans and Birds

To do: ( i ) Create a sample of imitated words produced by birds, for example common hill myna birds. ( ii ) Select the best examples with regard to intelligibility. ( iii ) Isolate the sound nuclei corresponding to a vowel. (iv) Perform a listening test. (v) Select the sounds with a high score of consistent vowel perception. (vi) Let a woman and a man in turn imitate the “words” of the birds at the corresponding F0, and isolate the vowel sound nuclei. (vii) Perform a second listening test for all the sounds compared. (viii) Perform spectral analysis and formant pattern analysis. Concerning the sounds of the birds, even if methodologically unsubstantiated, you may apply both standard parameter settings for females and for males.

Thesis: ( i ) You will be able to observe examples in which a bird can produce a sound with a formant pattern F1–F2–F3 that corresponds to the formant pattern of a woman or a man. ( ii ) You will also be able to observe examples for which the sound of a bird does exhibit only a part of the formant patterns produced by the woman or man, yet vowel perception is not impaired.

E6.2    Resynthesis Relating to “Anomalous” Formant Patterns of Sounds of Birds

To do: ( i ) Select the sounds of the birds of the previous experiment with intelligible vowel quality but only partial correspondence of the formant patterns compared with the sounds of the man or the woman. ( ii ) Perform resynthesis. ( iii ) Perform a listening test.

Thesis: You will be able to resynthesise these sounds related to “anomalous” formant patterns with no substantial change in perceived vowel quality compared with the natural sounds. ← 262 | 263 →

E7     Anomalous Vowel Spectra

E7.1    Spectra with Increasing Number of Harmonics Equal in Amplitude (“Flat” Vowel Spectra)

To do: ( i ) Perform vowel synthesis using a harmonic synthesiser, that is, create harmonic spectra, perform inverse Fourier analysis and repeat the periods obtained over time for a certain duration, for example 1 s. ( ii ) Investigate sounds at F0 of 110 Hz and 220 Hz separately. ( iii ) Start a synthesis with only the first harmonic or fundamental at a given F0. Then continue to add, step by step, harmonics 2, 3, 4, etc. equal in amplitude to the fundamental. (iv) Perform a listening test.

Option: You may also investigate F0 other than the two frequency levels mentioned.

Thesis: You will find some sounds in the sound series created for which the listening test gives a vowel identification of one of the vowels /u /, /o /, /ɔ / and /a /. Eventually, /ε / is also perceived.

Extension: You may extend the investigation to front vowels concerning “flat” spectral parts > c. 2 kHz. (Try also “flat” spectral parts > c. 1.5 kHz.) You may then start with a series of lower harmonics as found in natural vocalisations and add, step by step, harmonics equal in amplitude from c. 2 kHz (or from c. 1.5 kHz) upwards.

E7.2    Spectra with Increasing Number of Harmonic Pairs Showing Equal Amplitude Differences (“Ridged” Parts of Vowel Spectra)

To do: Apply the same procedure as described in the previous experiment but add, step-by-step, harmonics with periodic increasing and decreasing amplitudes; for example L2 (level of second amplitude) < L1 (level of first amplitude), L3 = L1, L4 = L2, and so on; or vice versa.

Thesis: ( i ) You will obtain results depending on the extent of the difference in the harmonic level you have set. ( ii ) However, within a limited range of such a difference, the listening test will provide similar results to those predicted for the previous experiment.

Extension: You may again extend the investigation to front vowels concerning “ridged” spectral parts > 2 kHz. ← 263 | 264 →

E8     Aspects of Method

E8.1    Formant Pattern Estimation Related to Non-Standard Parameters

To do: ( i ) Refer to a large sample of isolated vowel sounds. ( ii ) Perform LPC analysis applying standard parameters. ( iii ) Select the sounds for which the calculated formant patterns clearly do not correspond to what is expected if referred to formant statistics. However, do not try to include very high F0. (iv) Perform a listening test. (v) Select only sounds with a high score of identification. (vi) Perform LPC analysis again but alter the parameters.

Option: You may also perform resynthesis.

Thesis: ( i ) You will find various examples for which LPC analysis based on non-standard parameters as given in the literature—above all based on a non-standard maximum number of formants for a given frequency range, which is usually related to age and gender of the speaker—provides “better” (that is, more “expected”) results than LPC analysis based on standard parameters. ( ii ) However, you will not be able to relate this finding to a general production characteristic for all vowel sounds produced by a single speaker.

E8.2    Formant Pattern Estimation at F0 > 350 Hz

To do: ( i ) Select isolated vowel sounds produced at F0 > 350 Hz. ( ii ) Perform a listening test. ( iii ) Select sounds with a high identification score. (iv) Perform LPC analysis using standard parameters. (v) On the basis of the corresponding results, perform resynthesis. (vi) Perform a listening test related to the resynthesised sounds.

Thesis: ( i ) You will find variable results. ( ii ) However, you will find many examples for which the natural and the resynthesised sound is perceived as the same vowel, although the LPC analysis is not methodologically substantiated and the calculated formant pattern may differ strongly from values given in formant statistics. ← 264 | 265 →

E8.3    Resynthesis of Sounds at Varying F0 and Subsequent Formant Pattern Estimation

To do: ( i ) Select isolated natural vowel sounds. ( ii ) Perform a listening test. ( iii ) Select only sounds with a high identification score. (iv) Perform LPC analysis using standard parameters. (v) On the basis of the corresponding results, perform resynthesis for two conditions; first, use F0 of the natural vocalisation; second, use a very different F0 level. (vi) Perform a listening test again and select again only sounds with a high identification score. (7) Perform LPC analysis for both types of the resynthesised sounds.

Thesis: ( i ) You will find variable results. ( ii ) However, you will find many examples for which the calculated formant pattern of the resynthesised sounds differs substantially from the original formant pattern, if the F0 of the resynthesised and the natural sounds also differs substantially. ← 265 | 266 → ← 266 | 267 →