The intellectual and empirical reasoning presented in this treatise, however, gives rise to scepticism with respect to this understanding of the sound of the vowel. The reflections and materials presented provide reason to argue that, up to now, a comprehensible theory of the acoustics of the voice and of voiced speech sounds is lacking, and consequently, no satisfying understanding of vowels as an achievement and particular formal accomplishment of the voice exists. Thus, the question of the acoustics of the vowel – and with it the question of the acoustics of the voice itself – proves to be an unresolved fundamental problem.
3 Vowels and Number of Formants
As reported in the literature, when analysing samples of sounds of back vowels and of /a–α /, some sounds may exhibit only one distinct vowel-specific spectral envelope peak, whereas other sounds of the same vowels exhibit the expected two pronounced peaks.
Empirically, the number of vowel-specific relative spectral energy maxima proves to be inconstant for sounds of single vowels.
If sounds of back vowels and of /a–α / exhibit only a single vowel-specific spectral envelope peak, according to the literature, formant analysis (e.g. using LPC analysis) often reveals two close formant frequencies. Such cases are therefore referred to as formant merging. It follows that, for the sounds in question, the spectral envelope peak and the calculated first two formants do not correspond to one another.
Yet, if sounds of back vowels and of /a–α / exhibit two vowel-specific spectral envelope peaks, such a correspondence is generally found.
Thus, the observation of an inconstant number of vowel-specific spectral envelope peaks of sounds of one and the same vowel calls into question the fundamental relationship between spectral envelopes and calculated formants.
No direct parallelism exists between relative spectral energy maxima and calculated formants.
As shown in Part I, with regard to high front vowels and r-coloured front vowels of some languages, sounds belonging to these vowels can exhibit, in part, similar first and second lower spectral envelope peaks and formant analysis can reveal similar F1–F2. Thus, the sounds of the corresponding vowels are physically distinct only with regard to the third spectral envelope peak and the third formant, respectively.
For such languages, it follows that back vowels, as well as some of the front vowels, are physically describable in terms of different patterns of F1–F2, whereas the remaining front vowels have to be described only in terms of different patterns of F1–F2–F3.
Empirically, the number of vowel-specific relative spectral energy maxima and of calculated vowel-specific formants proves to be inconstant among different vowels.
With regard to spectral envelope peaks, then, the quality of some sounds of back vowels is represented by a single peak, the quality of other sounds of back vowels and sounds of some front vowels by two peaks and the quality of some front vowels by three peaks.
In the spectra of the sounds of certain speakers, an additional spectral envelope peak may occur between the expected first and second or second and third formant. According to the prevailing methodological rules for determining formants, this maximum is not interpreted as vowel specific but as a specific characteristic of the speaker’s voice in question. Therefore, it is referred to as a “spurious” formant.
Such “spurious” spectral envelope peaks also need to be considered within the context of the inconstant number of vowel-specific spectral envelope peaks.
Synthetically produced—and easily recognisable—vowel sounds can be generated for most vowel qualities using three- and two-formant synthesis. For certain vowels, in particular for back vowels and /a–α /, this is also possible by way of a one-formant synthesis.
With regard to synthesised sounds perceived as belonging to one vowel quality, a comparison of the sounds with F1’–F2’ (two-formant synthesis) and the sounds with F1’–F2’–F3’ (three-formant synthesis) reveals differences for F2’, in particular for sounds of front vowels. Similarly, a comparison of the sounds with F1’ (one-formant synthesis) and the sounds with F1’–F2’ (two-formant synthesis) reveals differences for F1’. (However, in the corresponding comparative studies, the fundamental frequency used in synthesis the was not varied systematically.)
Synthesis thus confirms the inconstant number of observable vowel-specific formants. Further, synthesis involving different numbers of formants (different numbers of filters) indicates differences for F1’ or F2’, respectively, although the sounds in question are perceived as belonging to the same vowel. ← 34 | 35 →