Loading...

Acoustics of the Vowel

Preliminaries

by Dieter Maurer (Author)
Monographs XVI, 280 Pages
Open Access

Table Of Content

  • Cover
  • Title
  • Copyright
  • About the Book
  • This eBook can be cited
  • Acknowledgements
  • Contents
  • Introduction
  • Part I: Prevailing Theory and Empirical References
  • 1 Prevailing Theory
  • 1.1 General Acoustic Characteristics of Vowel Sounds
  • 1.2 Language-Specific Acoustic Characteristics of Vowel Sounds
  • 1.3 Speaker Group-Specific Acoustic Characteristics of Vowel Sounds
  • 1.4 Phonation Type-Specific Acoustic Characteristics of Vowel Sounds and Limitation to Voiced Oral Sounds
  • 1.5 Limitation to Isolated Vowel Sounds
  • 1.6 Limitation to Vowel Sounds as Monophthongs with Quasi-Constant Sound Characteristics
  • 1.7 Speech Community-Specific Acoustic Characteristics of Vowel Sounds
  • 1.8 The Prevailing Theory of Physical Vowel Representation
  • 1.9 Formalising Prevailing Theory
  • 1.10 Illustration
  • 2 Prevailing Empirical References
  • 2.1 General References
  • 2.2 Empirical Reference for Standard German
  • 2.3 Other Statistical References
  • Part II: Reflections
  • 3 Vowels and Number of Formants
  • 3.1 Inconstant Number of Vowel-Specific Relative Spectral Energy Maxima in Sounds of Back Vowels and of / a–α /
  • 3.2 Inconstant Correspondence between Vowel-Specific Relative Spectral Energy Maxima and Calculated Vowel-Specific Formant Patterns
  • 3.3 Inconstant Number of Vowel-Specific Relative Spectral Energy Maxima and of Calculated Vowel-Specific Formants
  • 3.4 Addition: “Spurious” Formants
  • 3.5 Addition: “Flat” Vowel Spectra
  • 3.6 Addition: Inconstant Number of Vowel-Specific Formants in Synthesis
  • 4 Vowels and Fundamental Frequency
  • 4.1 Fundamental Frequency, First Formant and “Grade” of Vowels
  • 4.2 Fundamental Frequency, Spectral Envelope, Formant Pattern and “Grade” of Vowels
  • 5 Formant Patterns and Speaker Groups
  • 5.1 Fundamental Frequency, Spectral Envelope, Formant Pattern and “Grade” of Vowels Uttered by Children, Women and Men
  • 5.2 One Vowel, Different Formant Patterns
  • 5.3 Different Vowels, One Formant Pattern
  • 5.4 A Gap in the Reasoning
  • 5.5 Addition: Formant Patterns of Voiced and Whispered Vowel Sounds
  • 6 Terms of Reference, Methods of Formant Estimation
  • 6.1 Formant and Sound Spectrum
  • 6.2 Speaker Group and Vocal-Tract Size
  • 6.3 Formant Analysis and Objectivisation
  • 6.4 Formant Analysis, Fundamental Frequency and Speaker Group or Vocal-Tract Size
  • 6.5 Addition: Parameter Adjustments in Formant Analysis and Inconsistent References to Vocal-Tract Size
  • 6.6 Addition: Spectrum, Formant Pattern, Resynthesis
  • 6.7 Addition: Formant Analysis and Objectivity with Regard to Synthesised Vowel Sounds
  • 6.8 Addition: Formant Patterns and Resynthesis outside of the Framework of Prevailing Theory
  • Part III: Experiences and Observations
  • 7 Unsystematic Correspondence between Vowels, Patterns of Relative Spectral Energy Maxima and Formant Patterns
  • 7.1 Inconstant Number of Vowel-Specific Relative Spectral Energy Maxima and Incongruence of Vowel-Specific Formant Patterns
  • 7.2 Partial Lack of Manifestation of Vowel-Specific Relative Spectral Energy Maxima
  • 7.3 Addition: Resynthesis and Synthesis
  • 8 Lack of Correspondence between Vowels and Patterns of Relative Spectral Energy Maxima or Formant Patterns
  • 8.1 Dependence of Vowel-Specific, Relative Spectral Energy Maxima and Lower Formants ≤ 1.5 kHz on Fundamental Frequency
  • 8.2 Vowel Perception at Fundamental Frequencies above Statistical Values of the First-Formant Frequency
  • 8.3 “Inversions” of Relative Spectral Energy Maxima and Minima and “Inverse” Formant Patterns in Sounds of Individual Vowels
  • 8.4 Addition: Whispered Vowel Sounds, Fundamental-Frequency Dependence of Vowel-Specific Spectral Characteristics and “Inversions”
  • 8.5 Addition: Resynthesis and Synthesis
  • 9 Ambiguous Correspondence between Vowels and Patterns of Relative Spectral Energy Maxima or Formant Patterns or Complete Spectral Envelopes
  • 9.1 Ambiguous Patterns of Relative Spectral Energy Maxima and Ambiguous Formant Patterns
  • 9.2 Ambiguous Spectral Envelopes
  • 9.3 Ambiguity and Individual Vowels
  • 9.4 Addition: Resynthesis and Synthesis
  • 10 Lack of Correspondence between Patterns of Relative Spectral Energy Maxima or Formant Patterns and Speaker Groups or Vocal-Tract Sizes
  • 10.1 Similar Patterns of Relative Spectral Maxima and Similar Formant Patterns ≤ 1.5 kHz for Different Speaker Groups or Different Vocal-Tract Sizes
  • 10.2 The Dichotomy of the Vowel Spectrum
  • 10.3 Addition: Whispered Vowel Sounds and Speaker Groups or Vocal-Tract Sizes
  • 10.4 Addition: Vowel Imitations by Birds
  • 10.5 Addition: Resynthesis and Synthesis
  • 11 Lack of Correlation between Methodological Limitations of Formant Determination and Limitations of Vowel Perception
  • 11.1 Vowel Perception at Fundamental Frequencies > 350 Hz
  • 11.2 Lack of Correspondence between Methodological Problems of Formant Pattern Estimation at Fundamental Frequencies ≤ 350 Hz and Impaired Vowel Perception
  • 11.3 Addition: Lack of Methodological Basis of Determining Formant Patterns for Vowel Mimicry by Birds
  • Part IV: Falsification
  • 12 Empirical Falsification despite Methodological Limitations of Determining Patterns of Relative Spectral Envelope Maxima or Formant Patterns
  • 12.1 Lack of Methodological Basis for Verifying Prevailing Theory
  • 12.2 Systematic Divergence of Empirical Findings from Predictions of Prevailing Theory
  • 12.3 Empirical Findings Directly Contradicting Prevailing Theory
  • Part V: Commentary
  • 13 Preliminaries
  • 13.1 Impediments to Adjusting Prevailing Theory
  • 13.2 Prevailing Theory as an Index
  • 13.3 Excursus: Vowel Quality and Harmonic Spectrum
  • 13.4 “Forefield”
  • 13.5 Two Approaches
  • 13.6 Phenomenology
  • 13.7 Theory Building
  • Afterword
  • Materials
  • Materials Part I
  • M1 Prevailing Theory
  • M2 Prevailing Empirical References
  • Materials Part II
  • M3 Vowels and Number of Formants
  • M4 Vowels and Fundamental Frequency
  • M5 Formant Patterns and Speaker Groups
  • M6 Terms of Reference, Methods of Formant Estimation
  • Materials Part III
  • Note on the Method
  • M7 Unsystematic Correspondence between Vowels, Patterns of Relative Spectral Energy Maxima and Formant Patterns
  • M7.1 Inconstant Number of Vowel-Specific Relative Spectral Energy Maxima and Incongruence of Vowel-Specific Formant Patterns
  • M7.2 Partial Lack of Manifestation of Vowel-Specific Relative Spectral Energy Maxima
  • M8 Lack of Correspondence between Vowels and Patterns of Relative Spectral Energy Maxima or Formant Patterns
  • M8.1 Dependence of Vowel-Specific, Relative Spectral Energy Maxima and Lower Formants ≤ 1.5 kHz on Fundamental Frequency
  • M8.2 Vowel Perception at Fundamental Frequencies above Statistical Values of the Respective First Formant Frequency
  • M8.3 “Inversions” of Relative Spectral Energy Maxima and Minima and “Inverse” Formant Patterns in Sounds of Individual Vowels
  • M9 Ambiguous Correspondence between Vowels and Patterns of Relative Spectral Energy Maxima or Formant Patterns or Complete Spectral Envelopes
  • M9.1 Ambiguous Patterns of Relative Spectral Energy Maxima and Ambiguous Formant Patterns
  • M9.2 Ambiguous Spectral Envelopes
  • M9.3 Ambiguity and Individual Vowels
  • M10 Lack of Correspondence between Patternsof Relative Spectral Energy Maxima or Formant Patterns and Age- and Gender-Related Speaker Groups or Vocal-Tract Sizes
  • M10.1 Similar Patterns of Relative Spectral Maxima and Similar Formant Patterns ≤ 1.5 kHz for Different Age and Gender-Related Speaker Groups or Vocal-Tract Sizes
  • M10.2 The Dichotomy of the Vowel Spectrum
  • M10.2A Addition: Vowel Imitations by Birds
  • M11 Lack of Correlation between Methodological Limitations of Formant Determination and Limitations of Vowel Perception
  • M11.1 Vowel Perception at Fundamental Frequencies > 350 Hz
  • M11.2 Lack of Correspondence between Methodological Problems of Formant Pattern Estimation at Fundamental Frequencies ≤ 350 Hz and Impaired Vowel Perception
  • Experiments
  • E1 Number of Relative Spectral Energy Maxima and Number of Formants
  • E1.1 Sounds of Back Vowels Showing only One Lower ­Spectral Peak ≤ 1.5 kHz
  • E1.2 Sounds of Back Vowels Showing only One Pronounced Lower Formant ≤ 1.5 kHz
  • E1.3 Sounds of Single Front Vowels Showing Non-­Corresponding F2 and F3
  • E1.4 Sounds of Back Vowels Showing No Pronounced ­Spectral Peak ≤ 1.5 kHz
  • E1.5 Sounds of Front Vowels Showing No Pronounced Spectral Peak > 2 kHz
  • E2 Patterns of Relative Spectral Energy Maxima, Formant Patterns and Fundamental Frequency
  • E2.1 Sounds of Single Vowels Produced at Different F0 Exhibiting Different Spectral Peaks and Different Calculated Formant Patterns: Part 1, Dependence of Formant Patterns on F0
  • E2.2 Sounds of Single Vowels Produced at Different F0 Exhibiting Different Spectral Peaks and Different Calculated Formant Patterns: Part 2, Vowel Intelligibility for Sounds at F0 > 500 Hz
  • E2.3 Sounds of Single Vowels Produced at Different F0 Exhibiting Different Spectral Peaks and Different Calculated Formant Patterns: Part 3, Resynthesising a Formant Pattern at Different F0
  • E2.4 Sounds of Single Back Vowels Produced at Different F0 Exhibiting Inverse Spectral Peaks
  • E2.5 Special Note Concerning Inconstant Numerical Relationship between Calculated F0 and Formant Patterns
  • E3 Formant Pattern Ambiguity
  • E3.1 Formant Pattern Ambiguity in Natural Vocalisations
  • E3.2 Formant Pattern Ambiguity in Model Synthesis
  • E4 Patterns of Relative Spectral Energy Maxima, Formant Patterns and Age- and Gender-Related Vocal-Tract Sizes
  • E4.1 Comparison of Vowel-Specific Spectral Characteristics of Children, Women and Men Related to Different and Similar F0 of Vocalisations: Part 1, Natural Vocalisations
  • E4.2 Comparison of Vowel-Specific Spectral Characteristics of Children, Women and Men Related to Different and Similar F0 of Vocalisations: Part 2, Resynthesis
  • E5 Patterns of Relative Spectral Energy Maxima, Formant Patterns and Phonation Types
  • E5.1 Whispered Sounds Compared with Voiced Sounds at Different F0 in Utterances of a Single Speaker
  • E5.2 Whispered Sounds Compared with Voiced Sounds at Different F0 in Utterances of Speakers of Different Speaker Groups
  • E5.3 Sounds of Back Vowels Showing Three Spectral Peaks ≤ 1.5 kHz
  • E5.4 Sounds of Front Vowels Showing Two Spectral Peaks ≤ 1.5 kHz
  • E6 Patterns of Relative Spectral Energy Maxima, Formant Patterns and Vowel Imitation by Birds
  • E6.1 Direct Comparisons of Selected Sounds of Humans and Birds
  • E6.2 Resynthesis Relating to “Anomalous” Formant Patterns of Sounds of Birds
  • E7 Anomalous Vowel Spectra
  • E7.1 Spectra with Increasing Number of Harmonics Equal in Amplitude (“Flat” Vowel Spectra)
  • E7.2 Spectra with Increasing Number of Harmonic Pairs Showing Equal Amplitude Differences (“Ridged” Parts of Vowel Spectra)
  • E8 Aspects of Method
  • E8.1 Formant Pattern Estimation Related to Non-Standard Parameters
  • E8.2 Formant Pattern Estimation at F0 > 350 Hz
  • E8.3 Resynthesis of Sounds at Varying F0 and Subsequent Formant Pattern Estimation
  • List of Figures
  • List of Tables
  • References

Introduction

Topic and Aims

The vocal cords—when oscillating and modulating air expelled from the lungs—produce a sound (a source sound), which is transformed by the resonances of the pharyngeal, oral and nasal cavities: depending on the position of the larynx, velum, tongue, lips and jaw, different shapes of these cavities are formed thus creating different resonance characteristics, allowing different vocal sounds (phones) to be produced and perceived accordingly. If a vocal sound is perceived to belong to a particular linguistic unit (more precisely, a basic linguistic unit, a phoneme), and if the cavity formed by the pharynx and the mouth remains open, then the sound produced is referred to as a vowel sound and its linguistic identity as a vowel quality or simply as a vowel.

The prevailing theory of vowel acoustics begins with such formulations, or similar ones. According to this theory, with respect to human utterances, the vocal cords produce a general sound, which is transformed into a specific vowel sound by the resonances of the (supralaryngeal) vocal tract: as human beings, we phonate and articulate.

Because of this, vowel sounds, as sounds, are expected to exhibit relative spectral energy maxima in those frequency ranges that correspond to the resonances of the vocal tract during speech production. These spectral energy maxima are known as formants.

Such a perspective gives rise to the prevailing psychophysical principle of the vowel: vowel sounds that are perceived as having the same vowel quality have similar formant patterns, that is, similarly patterned relative spectral energy maxima. By contrast, vowel sounds that are perceived as different vowel qualities have dissimilar formant patterns.

At first glance, such a conception of vowel production and of the subsequent physical representation of vowels seems plausible or even self-evident. Our vocal cords do vibrate when we speak, we do move our mouths (more precisely, our articulators) to form different vocal sounds, and we are indeed often able to “lip read” the words uttered from such movements, an ability highly developed by deaf people.

Moreover, the vast majority of statistical investigations seem to confirm the correlation between vowels and vowel-specific formant patterns.

Vowel synthesis, transforming artificial source sounds by filters, have also proven to be very capable of producing recognisable vowel sounds. ← 1 | 2 →

From such a perspective, existing problems in analysing and determining the physical characteristics of vowel sounds according to the perceived vowel quality are not considered with regard to the principle of prevailing theory, but they are related to the dynamics and complexity of the production and perception of speech. Furthermore, isolated vowel sounds, for which a simple and statistical correspondence between the perceived vowel quality and its specific formant pattern is to be expected, are often considered as playing only a marginal role in everyday speech. In speech, vowel sounds and perceived vowel qualities are generally embedded in syntactic and semantic contexts, in contexts of other vocal sounds and of meaning. Such embedded vowel sounds exhibit distinct dynamic processes and above all transitions from one sound to another. Thus, vowel sounds may be perceived in speech even if distinct, static sound elements are absent, and a vowel sound isolated from speech as a sound fragment may be perceived as a different vowel quality than the same sound in connected speech. This explains, for example, why speech can remain intelligible even when substantial interferences or transformations affect its transmission. And so on.

Consequently, the current scientific discussions mainly focus on specific matters such as different types of phonation and articulation when producing vowel sounds, sound variations and dynamic processes related to the respective syntactic and semantic context, sounds produced by speakers of different age and gender and corresponding normalisation attempts, attempts to improve formant pattern estimation and attempts to relate acoustic findings and processes of auditory perception. And so on.

Having said that, notwithstanding, the present consideration returns to the basic assertion of the current acoustic theory of the vowel cited at the beginning of this introduction. It presents a critical reading, indeed a falsification, of this assertion. Further, it seeks to demonstrate that whereas prevailing theory indicates (is an index of) the actual physical characteristics of vowels, it fails to designate these characteristics adequately. As such, this work highlights an unresolved fundamental problem of the voiced speech sound, and thus of the voice as such, and raises this problem once again for discussion.

The form of this treatise is, in part, unusual in a scientific context. However, with the exception of the four aspects discussed below, this introduction dispenses with lengthy prefatory explanations. In its course, the argument and its form of presentation should become self-evident. Besides, additional comments in the afterword further expand on, and hopefully clarify, matters. ← 2 | 3 →

As mentioned, however, four introductory aspects are to be explained at this juncture. They concern linguistic expression and style, referencing, the significance of argumentation and the perspective adopted here.

Many parts of the main body of the text are “abstract” in their presentation, which is to say, they are “technical”. This might complicate the reading. Moreover, with the exception of Sections 1.10, 2.1 and 2.2, the text is not accompanied by illustrated examples or tables listing statistical data. Further, from Part III onwards, the text requires the reader to reflect thoroughly on the prevailing theory of the vowel as presented in Part I. The text also calls upon the reader to approach the related terms and concepts and the statistical values for formant patterns with a certain amount of self-assurance. However, such a procedure is necessary: the text insists on the discussion of a few fundamental reflections and general facts, and their interrelations, in the attempt, as mentioned, to highlight a fundamental problem.

Most of the issues considered here have already been discussed in the literature, and most of the corresponding publications were presented by other authors. However, they have often been interpreted in a way that differs from the point of view taken here. Yet, aside from the illustrations and tables mentioned, the text largely dispenses with explicit references to previous studies, including our own, so as to pursue its main argument without any detailed discussion and referencing of individual aspects. The Materials section (for the structure of this text, see below), however, includes a considerable number of citations, together with references to existent publications. Moreover, as mentioned above, my colleagues and I have discussed most of the aspects addressed here elsewhere. The present text is new in its course of argument, as is the arrangement and presentation of citations, comments, illustrated examples and outlines of experiments in the Materials and Experiments sections. However, new content but concerns aspects discussed in Part V and in the afterword, some presentations in the Materials section (see Sections M8.2, M10-A) and some examples in the Experiments section.

The empirical basis of this treatise, to which many of the statements made here refer, above all in Part III and IV, consists of recordings from various areas of everyday life, the entertainment sector and art, that is, stage voices in music and straight theatre. Whereas one part of these recordings forms the basis of single, published investigations undertaken in the past, another part is unpublished and the corresponding recordings have not been subject to any further identification tests, apart from the identification by the author. Thus, the reflections in Part ← 3 | 4 → III and IV lay no claim to consistent verification in terms of the existing scientific standards. Instead, they are formulated as hypotheses in view of general findings that are conceivable or even predictable. In line with this, illustrated examples are given in the Materials section.

Accordingly, this treatise is limited to presenting and interrelating those reflections, experiences and observations anew that tend to refute the assertion that vowel qualities are physically represented by formant patterns. If this undertaking proves successful, then—to repeat and insist—this once again raises the question of the voiced speech sound as a fundamental problem.

The argument focuses on and is limited to the relationship between individual vowel sounds, perceived vowel qualities, corresponding sound spectra and formant patterns in the sense of patterns of formant frequencies. Formant bandwidths and amplitudes, to mention two aspects of possible importance, are not discussed in detail.

This treatise adopts a decidedly psychophysical perspective. Only general reference is made to the production and perception of sounds: sound production is referred to because the concept of formants itself refers to vocal tract resonances and also because this relationship needs to be emphasised repeatedly in the course of the argument. Sound perception is referred to because the reflections presuppose that the vowel sounds discussed can be attributed to (perceptually identified as) the specific vowel qualities in question. Beyond these general references, however, production and perception are not further discussed.

By no means does excluding a consideration of further details of sound production and perception from the present discussion suggest that these aspects are unimportant for the physical description of vowels. Doing so merely serves to focus on the psychophysical question of the vowel: given that an utterance—or its reproduction, manipulated or not, or a synthesis for that matter—is perceived as a specific vowel quality, which describable physical characteristic or which ensemble of physical characteristics may be said to represent that quality?

In line with this, the argument focuses on voiced oral vowel sounds produced either in isolation or isolated (extracted) from syntactic and semantic contexts. Thus, nasalisation and the syntactic and semantic context are as such also excluded from discussion. With regard to the different types of phonation, only whispered vowels are considered here, and are mentioned only briefly. Again, this is intended to enable the straightforward discussion of the psychophysical question of the vowel. ← 4 | 5 →

Summary

It seems as if the fundamentals of how we produce vowels and how they are acoustically represented have been clarified: we phonate and articulate. Using our vocal chords, we produce a vocal sound or noise which is then shaped into a specific vowel sound by the resonances of the pharyngeal, oral, and nasal cavities, that is, the vocal tract. Accordingly, the acoustic description of vowels relates to vowelspecific patterns of relative energy maxima in the sound spectra, known as patterns of formants.
The intellectual and empirical reasoning presented in this treatise, however, gives rise to scepticism with respect to this understanding of the sound of the vowel. The reflections and materials presented provide reason to argue that, up to now, a comprehensible theory of the acoustics of the voice and of voiced speech sounds is lacking, and consequently, no satisfying understanding of vowels as an achievement and particular formal accomplishment of the voice exists. Thus, the question of the acoustics of the vowel – and with it the question of the acoustics of the voice itself – proves to be an unresolved fundamental problem.

Details

Pages
XVI, 280
ISBN (PDF)
9783034323918
ISBN (ePUB)
9783034326179
ISBN (MOBI)
9783034326186
ISBN (Book)
9783034320313
Open Access
CC-BY-NC-ND
Language
English
Publication date
2016 (January)
Published
Bern, Berlin, Bruxelles, Frankfurt am Main, New York, Oxford, Wien, 2016. XVI, 280 pp.

Biographical notes

Dieter Maurer (Author)

Dieter Maurer (*1955, Zurich) studied education, philosophy and psychology at the universities of Tubingen and Zurich. His dissertation, subsequent research and publications are devoted to inquiry into the formal character of vocal and visual expressions, as aspects of the syntactics of voices and pictures. His work addresses both foundation studies as well as art education. He is currently engaged in research and teaching at the Zurich University of the Arts.

Previous

Title: Acoustics of the Vowel