Individual Differences in Speech Production and Perception

by Susanne Fuchs (Volume editor) Daniel Pape (Volume editor) Caterina Petrone (Volume editor) Pascal Perrier (Volume editor)
Edited Collection 284 Pages

Table Of Content

  • Cover
  • Title
  • Copyright
  • About the Editors
  • About the Book
  • This eBook can be cited
  • Contents
  • Preface
  • Perception of Speaker-Specific Phonetic Detail (Rachel Smith)
  • Perceptual Adjustments to Speaker Variation (Frank Eisner)
  • The Effects of Talker Voice and Accent on Young Children’s Speech Perception (Marieke van Heugten / Christina Bergmann / Alejandrina Cristia)
  • Psycholinguistics and Planning: A Focus on Individual Differences (Benjamin Swets)
  • Listener-Specific Perception of Speaker-Specific Productions in Intonation (Francesco Cangemi / Martina Krüger / Martine Grice)
  • Individual Differences in the Prosodic Encoding of Informativity (Iris Chuoying Ouyang / Elsi Kaiser)
  • Organic Sources of Inter-Speaker Variability in Articulation: Insights from Twin Studies and Male and Female Speech (Melanie Weirich)
  • Biomechanics of the Orofacial Motor System: Influence of Speaker-Specific Characteristics on Speech Production (Pascal Perrier / Ralf Winkler)
  • Forensic Speaker Recognition: Mirages and Reality (Jean-François Bonastre / Juliette Kahn / Solange Rossato / Moez Ajili)
  • Series Index


In the night of January 1st, 2015, mankind approached a size of human beings (see http://www.dsw.org/home.html). In this context, it seems an illusion to study individual behaviour in speech production and perception, even within a certain language. However, inter-individual variation in speech is a topic of increasing interest in linguistics, psychology, and it is the topic of our book.


Theoretical approaches have undergone a paradigm shift, moving from abstractionist to exemplar, and hybrid models. Abstractionist models treat speaker variation independently of abstract linguistic entities and consider it as noise in the data, which could be eliminated. A different view is taken by exemplar approaches assuming no separation of linguistic categories from other contextual information, e.g., indexical information about the speaker and his/her voice. All these may potentially be stored in memory. Both approaches can be seen as two extremes, but various ideas may be combined (hybrid models). In this sense we would not doubt that abstract representations of linguistic categories exist, but we would also acknowledge the richness and multidimensionality of speech signals which can facilitate speech perception.

When we talk about individual behaviour in this book, we are specifically interested in the details of the speech signals that can reveal us further insights into multiple factors affecting speech production, processing, and comprehension. So far, we are not interested in every little detail of a single speaker or listener, but rather in consistent details of speech production and perception. The crux in such an approach is to find out which of these details reveal important information about the biological, linguistic, cognitive, and social underpinnings of language in context.

The authors of this book were successful in finding several consistencies and discuss them in light of the mechanisms involved in the fascinating ability to produce and perceive speech. In particular, ← 7 | 8 →

Rachel Smith starts her chapter with an overview of how inter-speaker variability has been treated by different perception theories. The focus is particularly laid on abstractionist, exemplar, and hybrid approaches. These vary in how much they take into account inter-speaker variability as an information source and store this information in memory. The author continues with a comprehensive review of studies investigating fine phonetic detail which can reveal insights concerning numerous variables of a given speaker and commonalities across speaker groups.

Frank Eisner reviews some recent findings on how listeners can adapt to speaker variation and which role this variation plays for learning perceptual categories in adults. Eisner provides evidence that exposure to multiple speakers could help learning abstract representations on a lexical level. Sub-lexical processing of speaker idiosyncratic properties additionally has an impact on speech perception as shown by neurobiological and computational models. In particular, previously learned idiosyncratic properties influence perceptual expectations.

Marieke van Heugten, Christina Bergmann, and Alejandrina Cristia provide complementary evidence about perceptual learning with a particular focus on spoken language acquisition. Specifically, they review the literature on how young children and toddlers cope with speaker differences, regional accents, and language variation when acquiring their mother tongue. Although processing unfamiliar voices and accents is more complex than processing familiar ones, small children are extremely flexible in coping with speaker variation, and they even take advantage of it to learn their language. Indeed, infants use variability in speakers’ voices to access the underlying structure. Differences in the way individuals speak can thus serve as a frame of reference to help infants accommodate variation.

Benjamin Swets studies the cognitive architecture of language. He summarizes his work on individual differences in the scope of advance planning. His results show consistently that individual differences can be systematic and, in his particular topic, reveal insights into the relation between individual working memory capacities and the scope of advance speech planning. Furthermore, he suggests that the size of working memory capacities ← 8 | 9 → could play a general role in packing information together for production and comprehension purposes.

Francesco Cangemi, Martina Krüger, and Martine Grice explicitly study the nature of the link between speaker- and listener-specific behaviour in the production and perception of prosodic categories. Their particularly novel finding is that speakers vary contextually, i.e. a given speaker can be more intelligible than other speakers for a particular listener, although she/he may be less intelligible than average for another specific listener. These findings suggest that speech comprehension of prosodic categories is shaped by the specificities of particular dyads.

Iris Chuoying Ouyang and Elsi Kaiser, too, dedicate their chapter to prosody. They investigate the prosodic realization of information-structural factors (new-information and corrective focus), crossed with information-theoretic factors (word frequency and contextual probability), in terms of both inter- and intra-speaker variability. The results show that these two types of factors interact in determining several aspects of the fundamental frequency contours. Moreover, speakers exhibit individual variability regarding the magnitude of prosodic cues, but the direction of prosodic distinctions between information categories is consistent across speakers.

Melanie Weirich presents her work on organic sources for inter-speaker variability in articulation with an emphasis on palatal shape, vocal tract dimensions, and tongue biomechanics. The speaker groups that are taken into account are monozygotic versus dizygotic twins who grew up together, and male versus female adults. Based on the analyses of selected phonemes and phonemic contrasts, it is shown that individual differences in organic structures can at least partially explain some idiosyncratic aspects of articulation, and the often observed speaker variation is far more than only random noise.

Pascal Perrier and Ralf Winkler tackle inter-speaker variation from the perspective of the biomechanical properties of the orofacial system. For this purpose they used biomechanical models, since there is no direct way to observe the consequences of the control by the Central Nervous System and those of the biomechanics of the motor system independently. In the first study, the authors show that inter-speaker differences in the main fibre ← 9 | 10 → direction of the Styloglossus muscle can shape the articulatory and acoustic variability in a high vowel. In the second study, the authors show that different implementations of the Orbicularis Oris muscle have an impact on the degree of lip aperture in speech production.

Jean-François Bonastre, Juliette Kahn, Solange Rossato, and Moez Ajili complete the book with their chapter on an applied topic – forensic speaker recognition. They particularly warn about deriving conclusions about the detection of a speaker, similarly to a fingerprint or a DNA analysis. The acoustic signal of a speaker can't be interpreted as physical biometrics. It is a complex signal including information about the human being as a biopsychosocial unit in interaction with others. The authors summarize the main weaknesses of the methodology that make forensic phonetics in court a controversial topic, even if automatic speech recognition has substantially improved its algorithms over the last decades.

This book was inspired by the ideas from the project “SPEECHart- Speaker-specific articulation as adaptation to individual vocal tract shapes” (sponsored by the German Research Council) and the fourth summer school on „Speech production and perception: Speaker-specific behaviour“, which was held from the September 30th to October 4th, 2013, in Aix-en-Provence. The summer school was jointly organized by the Laboratoire Parole et Langage in Aix-en-Provence, the Centre for General Linguistics in Berlin, and the GIPSA-lab in Grenoble. It could take place thanks to the financial support by the Ministry for Education and Research (BMBF) and the PILIOS project which was sponsored by the French-German University in Saarbrücken. ← 10 | 11 →

Rachel Smith

University of Glasgow

Perception of Speaker-Specific Phonetic Detail

Abstract: The individual speaker is one source among many of systematic variation in the speech signal. As such, speaker idiosyncrasies have attracted growing interest among researchers of speech perception, especially since the 1990s, when theories began to treat variation as information rather than noise. It is now a common assumption that people remember and respond to speaker-specific phonetic behaviour. But what aspects of speaker-specific behaviour are learned about and used to guide perception? Do listeners make full use of the richness of speaker-specific information available in the signal, and how can listeners’ use of such information be modelled? In this chapter I review evidence that processing of the linguistic message is affected by inter-speaker variation in a number of aspects of phonetic detail. Phonetic detail is defined here as patterns of phonetic information that are systematically distributed in the signal and perform particular linguistic or conversational functions, but whose perceptual contribution extends beyond signalling basic phonological contrasts (such as differences between phonemes or between categories of pitch accent). Following Polysp, the Polysystemic Speech Perception model of Hawkins and colleagues (Hawkins and Smith, 2001; Hawkins, 2003, 2010), I argue that people can learn about speaker-specific realisations of any type of linguistic structure, from sub-phonemic features up to larger prosodic structures and, potentially, conversational units such as speaking turns. Speaker-specific attributes may even, on a more associative basis, enable direct access to aspects of meaning. I discuss circumstances liable to promote or disfavour the storage of speaker-specific phonetic detail, considering issues such as the frequency and salience of particular speaker-specific patterns in the input, and listener biases in attribution of variation to possible causes.

1.   The changing role of the speaker in speech perception theories

Individual speakers are a source of considerable variability in the realisation of linguistic categories. This much has been clear since the early days of acoustic phonetics: for example, Peterson and Barney (1952) measured formant frequencies of American English vowels spoken by adult male, female and child speakers, and demonstrated not only extensive within-category ← 11 | 12 → variation, but also between-category overlap, when vowel tokens were plotted in F1-F2 space. Very many speech production studies show that, while speakers behave consistently with one another in many ways, there is also a significant degree of variability among them. For example, Johnson et al. (1993) found variation in the degree to which speakers of American English recruited the jaw to produce low vowels; Borden and Gay (1979) observed some speakers to produce /s/ with the tongue-tip up and others with it down (for a few more examples among many, see Dilley et al., 1996; Fougeron and Keating, 1997; van den Heuvel et al., 1996).

The implications of this inter-speaker variability for perception have been interpreted in shifting ways over the years. In the 1970s and 1980s, the dominant assumption was that speaker variability had to be stripped away, or normalised, before sounds and words could be recognised. Halle (1985: 101) writes: “when we learn a new word we practically never remember most of the salient acoustic properties that must have been present in the signal that struck our ears; for example, we do not remember the voice quality of the person who taught us the word or the rate at which the word was pronounced.” Views such as Halle’s are often referred to as abstraction-ism: i.e. the assumption that the brain must store abstract linguistic units, in order to account for the compositionality of language (e.g. McClelland and Elman, 1986; Norris et al., 2000; Pisoni and Luce, 1987). According to abstractionist views, the perceptual details of individual utterances do not ordinarily form part of linguistic representation. (Nonetheless the perceptual details of spoken utterances can be remembered and accessed for some purposes, such as autobiographical memory.) With isolated exceptions (Klatt, 1979 and to a lesser extent Wickelgren, 1969), the idea that words are stored in the form of discrete symbolic units dominated psycholinguistics and speech perception research until the 1990s. Accordingly, researchers sought to develop the best algorithms to normalise the speech signal across speakers, and/or to identify properties of sounds that remained invariant across speakers (e.g. Stevens, 1989).


ISBN (Book)
Publication date
2015 (September)
Frankfurt am Main, Berlin, Bern, Bruxelles, New York, Oxford, Wien, 2015. 284 pp., 3 tables, 55 graphs

Biographical notes

Susanne Fuchs (Volume editor) Daniel Pape (Volume editor) Caterina Petrone (Volume editor) Pascal Perrier (Volume editor)

Susanne Fuchs works at ZAS Berlin and is an expert in speech production. Daniel Pape works at the University of Aveiro. He is an expert in speech perception. Caterina Petrone is a CNRS researcher at the LPL in Aix-en-Provence and an expert in prosody. Pascal Perrier is a professor at Université Grenoble Alpes and an expert in speech production models.


Title: Individual Differences in Speech Production and Perception