Show Less
Open access

Individual Differences in Speech Production and Perception


Edited By Susanne Fuchs, Daniel Pape, Caterina Petrone and Pascal Perrier

Inter-individual variation in speech is a topic of increasing interest both in human sciences and speech technology. It can yield important insights into biological, cognitive, communicative, and social aspects of language. Written by specialists in psycholinguistics, phonetics, speech development, speech perception and speech technology, this volume presents experimental and modeling studies that provide the reader with a deep understanding of interspeaker variability and its role in speech processing, speech development, and interspeaker interactions. It discusses how theoretical models take into account individual behavior, explains why interspeaker variability enriches speech communication, and summarizes the limitations of the use of speaker information in forensics.
Show Summary Details
Open access



In the night of January 1st, 2015, mankind approached a size of human beings (see In this context, it seems an illusion to study individual behaviour in speech production and perception, even within a certain language. However, inter-individual variation in speech is a topic of increasing interest in linguistics, psychology, and it is the topic of our book.


Theoretical approaches have undergone a paradigm shift, moving from abstractionist to exemplar, and hybrid models. Abstractionist models treat speaker variation independently of abstract linguistic entities and consider it as noise in the data, which could be eliminated. A different view is taken by exemplar approaches assuming no separation of linguistic categories from other contextual information, e.g., indexical information about the speaker and his/her voice. All these may potentially be stored in memory. Both approaches can be seen as two extremes, but various ideas may be combined (hybrid models). In this sense we would not doubt that abstract representations of linguistic categories exist, but we would also acknowledge the richness and multidimensionality of speech signals which can facilitate speech perception.

When we talk about individual behaviour in this book, we are specifically interested in the details of the speech signals that can reveal us further insights into multiple factors affecting speech production, processing, and comprehension. So far, we are not interested in every little detail of a single speaker or listener, but rather in consistent details of speech production and perception. The crux in such an approach is to find out which of these details reveal important information about the biological, linguistic, cognitive, and social underpinnings of language in context.

The authors of this book were successful in finding several consistencies and discuss them in light of the mechanisms involved in the fascinating ability to produce and perceive speech. In particular, ← 7 | 8 →

Rachel Smith starts her chapter with an overview of how inter-speaker variability has been treated by different perception theories. The focus is particularly laid on abstractionist, exemplar, and hybrid approaches. These vary in how much they take into account inter-speaker variability as an information source and store this information in memory. The author continues with a comprehensive review of studies investigating fine phonetic detail which can reveal insights concerning numerous variables of a given speaker and commonalities across speaker groups.

Frank Eisner reviews some recent findings on how listeners can adapt to speaker variation and which role this variation plays for learning perceptual categories in adults. Eisner provides evidence that exposure to multiple speakers could help learning abstract representations on a lexical level. Sub-lexical processing of speaker idiosyncratic properties additionally has an impact on speech perception as shown by neurobiological and computational models. In particular, previously learned idiosyncratic properties influence perceptual expectations.

Marieke van Heugten, Christina Bergmann, and Alejandrina Cristia provide complementary evidence about perceptual learning with a particular focus on spoken language acquisition. Specifically, they review the literature on how young children and toddlers cope with speaker differences, regional accents, and language variation when acquiring their mother tongue. Although processing unfamiliar voices and accents is more complex than processing familiar ones, small children are extremely flexible in coping with speaker variation, and they even take advantage of it to learn their language. Indeed, infants use variability in speakers’ voices to access the underlying structure. Differences in the way individuals speak can thus serve as a frame of reference to help infants accommodate variation.

Benjamin Swets studies the cognitive architecture of language. He summarizes his work on individual differences in the scope of advance planning. His results show consistently that individual differences can be systematic and, in his particular topic, reveal insights into the relation between individual working memory capacities and the scope of advance speech planning. Furthermore, he suggests that the size of working memory capacities ← 8 | 9 → could play a general role in packing information together for production and comprehension purposes.

Francesco Cangemi, Martina Krüger, and Martine Grice explicitly study the nature of the link between speaker- and listener-specific behaviour in the production and perception of prosodic categories. Their particularly novel finding is that speakers vary contextually, i.e. a given speaker can be more intelligible than other speakers for a particular listener, although she/he may be less intelligible than average for another specific listener. These findings suggest that speech comprehension of prosodic categories is shaped by the specificities of particular dyads.

Iris Chuoying Ouyang and Elsi Kaiser, too, dedicate their chapter to prosody. They investigate the prosodic realization of information-structural factors (new-information and corrective focus), crossed with information-theoretic factors (word frequency and contextual probability), in terms of both inter- and intra-speaker variability. The results show that these two types of factors interact in determining several aspects of the fundamental frequency contours. Moreover, speakers exhibit individual variability regarding the magnitude of prosodic cues, but the direction of prosodic distinctions between information categories is consistent across speakers.

Melanie Weirich presents her work on organic sources for inter-speaker variability in articulation with an emphasis on palatal shape, vocal tract dimensions, and tongue biomechanics. The speaker groups that are taken into account are monozygotic versus dizygotic twins who grew up together, and male versus female adults. Based on the analyses of selected phonemes and phonemic contrasts, it is shown that individual differences in organic structures can at least partially explain some idiosyncratic aspects of articulation, and the often observed speaker variation is far more than only random noise.

Pascal Perrier and Ralf Winkler tackle inter-speaker variation from the perspective of the biomechanical properties of the orofacial system. For this purpose they used biomechanical models, since there is no direct way to observe the consequences of the control by the Central Nervous System and those of the biomechanics of the motor system independently. In the first study, the authors show that inter-speaker differences in the main fibre ← 9 | 10 → direction of the Styloglossus muscle can shape the articulatory and acoustic variability in a high vowel. In the second study, the authors show that different implementations of the Orbicularis Oris muscle have an impact on the degree of lip aperture in speech production.

Jean-François Bonastre, Juliette Kahn, Solange Rossato, and Moez Ajili complete the book with their chapter on an applied topic – forensic speaker recognition. They particularly warn about deriving conclusions about the detection of a speaker, similarly to a fingerprint or a DNA analysis. The acoustic signal of a speaker can't be interpreted as physical biometrics. It is a complex signal including information about the human being as a biopsychosocial unit in interaction with others. The authors summarize the main weaknesses of the methodology that make forensic phonetics in court a controversial topic, even if automatic speech recognition has substantially improved its algorithms over the last decades.

This book was inspired by the ideas from the project “SPEECHart- Speaker-specific articulation as adaptation to individual vocal tract shapes” (sponsored by the German Research Council) and the fourth summer school on „Speech production and perception: Speaker-specific behaviour“, which was held from the September 30th to October 4th, 2013, in Aix-en-Provence. The summer school was jointly organized by the Laboratoire Parole et Langage in Aix-en-Provence, the Centre for General Linguistics in Berlin, and the GIPSA-lab in Grenoble. It could take place thanks to the financial support by the Ministry for Education and Research (BMBF) and the PILIOS project which was sponsored by the French-German University in Saarbrücken. ← 10 | 11 →