Biomechanics of the Orofacial Motor System: Influence of Speaker-Specific Characteristics on Speech Production
1: Gipsa-lab, CNRS and Grenoble INP 2: Centre for General Linguistics Berlin
Abstract: Orofacial biomechanics has been shown to influence the time signals of speech production and to impose constraints with which the central nervous system has to contend in order to achieve the goals of speech production. After a short explanation of the concept of biomechanics and its link with the variables usually measured in phonetics, two modeling studies are presented, which exemplify the influence of speaker-specific vocal tract morphology and muscle anatomy on speech production. First, speaker-specific 2D biomechanical models of the vocal tract were used that accounted for inter-speaker differences in head morphology. In particular, speakers have different main fiber orientations in the Styloglossus Muscle. Focusing on vowel /i/ it was shown that these differences induce speaker-specific susceptibility to changes in this muscle’s activation. Second, the study by Stavness et al. (2013) is summarized. These authors investigated the role of a potential inter-speaker variability of the Orbicularis Oris Muscle implementation with a 3D biomechanical face model. A deeper implementation tends to reduce lip aperture; an increase in peripheralness tends to increase lip protrusion. With these studies, we illustrate the fact that speaker-specific orofacial biomechanics influences the patterns of articulatory and acoustic variability, and the emergence of speech control strategies.
The variability of speech production observed across native speakers of the same language obviously results from a combination of multiple and complex origins. Among them we can mention social factors such as family origins (Hazen, 2002; Foulkes and Docherty, 2006), gender identity (Fuchs et al., 2010), and sexual orientation (Munson and Babel, 2007), and more intrinsic physical factors such as vocal tract morphology (Fuchs et al., 2008; Winkler et al., 2011a; Lammert et al., 2013) and orofacial biomechanics. In this paper we will focus on the influence of orofacial biomechanics. ← 223 | 224 →
With the term biomechanics, we understand the mechanics of the human body, and with the term mechanics we understand:
1) the description of the forces or stresses acting on the body (i.e. the kinetics of the body);
2) the characterization of the intrinsic mechanical properties of the body, i.e. mass, stiffness, damping, elasticity…;
3) the mathematical formulation of the physical rules determining the link between the forces and stresses applied to the body, and the time motion/deformation of the body; this describes the dynamics of the body interacting with its physical environment (see Winters, 2009 for an excellent course about biomechanics and human movements).
Note that the variables characterizing the time motion/deformation of the body, namely its position, velocity and acceleration, are called kinematic variables. Kinematic variables are the variables that are usually measured in experimental phonetics. Hence, a biomechanical characterization of speech production goes further into the origins of movements than traditional experimental phonetics.
The quantitative evaluation of the influence of biomechanics on speech articulation in healthy speakers is difficult to achieve. Indeed, when humans or animals produce an intentional movement, their central nervous system (CNS) sends a number of commands to the muscles. These commands, called motor commands, will not only generate a displacement of the peripheral motor system (i.e., for example, of the finger, the arm, the limb, the tongue or the mandible), but they will also change some of the biomechanical characteristics of the motor system. It is known for example that the activation of a muscle generates a stiffening of this muscle in the direction orthogonal to the muscle fibers, a phenomenon called stress stiffening that is easily observable when someone strongly activates his or her biceps. A sequence of motor commands that achieve a given motor task is called a motor control strategy. In speech production healthy speakers have learned how to elaborate motor control strategies of their speech production apparatus in a way that ensures the efficacy of the communication with listeners. Hence, only the result of the combined influences of motor control strategies and biomechanics can be experimentally observed. To evaluate the respective contribution of these two factors individually, it is ← 224 | 225 → necessary to design separate models of motor control and biomechanics. These models account for the specific influence of each of these factors on the kinematic properties of the movement. Knowing these separate influences, it is possible to analyze experimental observations from real speakers and reveal how motor control integrates biomechanical constraints to achieve the speech signals that are correctly perceived.
Model-based evaluations of the influence of the dynamics of the vocal tract articulators on speech production have been provided in a number of past studies, in which articulators’ dynamics was modeled by a second order system1. The authors have in particular investigated the link between articulatory stiffness and clarity of speech production (Browman and Goldstein, 1985; Kelso et al., 1985; Perrier et al., 1996). However, as explained above, biomechanics means much more than dynamics, and the dynamics of the orofacial motor system is much more complex than the one described by a second order system (see Fuchs et al., 2011, for a quantitative evaluation of this specific aspect).
A number of studies have investigated the influence of more complex biomechanical properties of the peripheral motor system on movement trajectories and motor control strategies. Flanagan et al. (1990) have shown that the gently curved shape of the arm trajectories observed in reaching tasks could be the consequence of the motor system dynamics. Perrier et al. (2003) have suggested that the looping patterns observed in tongue movements during the production of [aka], [aku] or [aki] speech sequences (Mooshammer et al., 1995) could arise from a combination of the effects of the dynamics of the tongue and of the muscle arrangements acting on this articulator. Gribble et al. (1996) for arm movements, and Perrier and Fuchs (2008) for tongue movements during speech production, have provided convergent evidence that the relation between trajectory curvature and tangential velocity that is observed in human movements (the so-called 2/3 power law proposed by Viviani and Stucchi, 1992), could result from global dynamical properties of the arm and the tongue. Perrier et al. (2000) have shown that the main directions of tongue deformation for vowels in ← 225 | 226 → various languages (Harshman et al., 1977; Jackson, 1988) correspond to the main directions of the mechanical influences of the synergies between tongue muscles (see also Fuchs and Perrier, 2005).
Nazari et al. (2011) have shown that tissue stiffening in the lips due to the activation of the Orbicularis Oris muscle (see below for more details about this muscle) would significantly help in the achievement of the protrusion and rounding gesture required for the production of /y/ or /u/ in French. Franklin et al. (2007) have experimentally found that in reaching arm movements toward a target the central nervous system (CNS) adjusts muscles’ activities so that the arm at target is the least vulnerable to perturbing forces. For this to happen the CNS adjusts the direction of the largest arm stiffness so that it matches the direction along which the reaching task requires the greatest accuracy. In the same vein, Cos et al. (2011) asked human subjects to choose between two potential reaching movements that shared the same ultimate target, but had different characteristics in terms of path distance and mechanical stability at the target. The subjects selected the movements that provided the better stability at the target. Cos et al. (2011) have thus shown that the knowledge of the biomechanical properties of the arm at the target influences decision-making processes in the production of movements.
In North American English, the articulation of the sound /r/ exhibits a noticeable contextual variability for some speakers. In the context of the vowel /i/, /r/ is produced with a bunched tongue having its highest point in the velar region. In the context of the vowel /a/, /r/ is produced with a tip-up tongue shape having its highest point in the alveolar region. Using simulations with a 3D biomechanical model of the tongue (Buchaillard et al., 2009), Stavness et al. (2012) have shown that this co-occurrence can be explained by the fact that it minimizes the change of the stress within the tongue from /r/ to the vowel /i/ or /a/.
Since biomechanics has been shown to influence both motor control strategies and the kinematic properties of movements, it is tempting to think that variability across speakers in the biomechanical characteristics of the orofacial motor system could contribute to the emergence of speaker-specific speech characteristics, also called speaker idiosyncrasies. In this paper we will focus on speaker-specific aspects of the kinetics of the orofacial motor system. Kinetics includes a description of the mechanisms underlying the ← 226 | 227 → generation of muscle forces, and an account of the directions in which these forces are applied. It also integrates the external force field acting on the body. Since most muscles are attached to the skeleton, it is easy to understand that the morphology of the skeleton, namely the size and the shape of the bones, their articulations with each other, i.e. the anthropometry, significantly determines the biomechanical properties. This is particularly true for vocal tracts in adults2. First, because the shapes of the head and the neck determine the shape of the tongue (Fitch and Giedd, 1999), the direction of the tongue muscle fibers and of their associated forces, and second, because the palate and the tongue interact mechanically through contact forces in particular during consonant production. Hence, in order to study speaker-specific aspects of the kinetics, models have to include a description of the skull and a description of the muscles and muscle force generation mechanisms. Such models are called biomechanical models.
In this paper we present two modeling studies based on two kinds of biomechanical models, in which the influence of speaker-specific characteristics will be assessed with simulations. In the first section, some basics in orofacial anatomy will be presented that will facilitate the understanding of the design and the use of the biomechanical models presented in the subsequent sections. In the second section, we will present an assessment of the influence of inter-speaker variations in the global morphology of the skull and neck set on the shape of the tongue and on the control of vowel /i/. This assessment is based on a simplified 2D biomechanical model of the vocal tract, which is adapted to the morphology of two different speakers according to the method proposed by Winkler et al. (2011b). In the last section, an assessment of the influence of potential inter-speaker variations in the Orbicularis Oris anatomy on the lip protrusion gesture will be summarized, which is based on simulations run by Stavness et al. (2013) with a quite complex 3D biomechanical model of the face (Nazari et al., 2010). ← 227 | 228 →
2. Some basics in orofacial muscle anatomy
In this section we provide basic information about the anatomy of the tongue and face that will be useful for understanding the modeling work presented below. This description contains a number of simplifications of the quite complex anatomical reality. For the tongue a very accurate description can be found in Takemoto (2001).
Figure 1: Representation of the main muscles acting on the mobile part of the tongue. Upper panel: view from the left hand side; bottom panel: transversal cut of half the tongue (from the left to the right) seen from the front (from Gray, 1918 in Bartleby.com, 2000).← 228 | 229 →
The mobile part of the tongue is controlled by eight muscles that are represented in Figure 1. Four of these muscles are considered to be extrinsic muscles, because at one of their extremities they attach to structures that are external to the tongue. They are as follows: Genioglossus, in the central part of the tongue, which originates from the inner mandibular surface at the Symphysis (bottom left of the top panel); the Styloglossus which emanates from the styloid process in the temporal region of the head (upper right of the top panel); the Hyoglossus originating from the greater horns of the hyoid bone (bottom right of the top panel); and the Palatoglossus (not represented in this figure) emanating from the anterolateral palatal aponeurosis in the soft palate. The other four muscles are intrinsic, since both extremities are within the tongue (see in particular the bottom panel in Figure 1): the Longitudinalis Superior, the Longitudinalis Inferior, the Transversus and the Verticalis (not represented in this figure). Not listed here, other muscles located in the mouth floor act indirectly on the tongue, in particular muscles involved in hyoid bone movement. The fiber directions of the extrinsic muscles are influenced by the shape of the tongue and also by the morphology of the jaw, the hyoid bone and the temporal bone, while fiber directions of the intrinsic muscles only depend on the tongue shape.
The lip shape can be modified by the control of 11 orofacial pairs of muscles (see Figure 2) located symmetrically on both sides of the mid-sagittal plane. According to their influence on the lips, they are classified into the upper lip elevators (Levator Labii Alaeque Nasi, Levator Labii Superioris and Zygomaticus Minor), the lip corner mobilizers (Levator Anguli Oris, Zygomaticus Major, Risorius, Buccinator and Depressor Anguli Oris), the lower lip mobilizers (Depressor Labii Inferioris and Mentalis, not represented in Figure 2) and the oral fissure constrictors (Peripheralis and Marginalis parts of the Orbicularis Oris). All these muscles originate from bony structures of the skull, except the Orbicularis Oris muscle, which emanates from a lip corner and inserts into the opposite corner of the lips (muscular tissue). The Orbicularis Oris muscle is composed of an upper and a lower part. ← 229 | 230 →
Figure 2: Schematic representation of the muscles determining the shape of the lips. These muscles are grouped in pairs located symmetrically on both sides of the mid-sagittal plane, but for matter of simplification only one side of each muscle pair is represented. Reprinted from Journal of Anatomy 214(1), 36-44, Rogers, C.R., Mooney, M. P., Smith, T. D., Weinberg, S. M., Waller, B. M., Parr, L. A., Docherty, B. A., Bonar, C. J., Reinholt, L. E., Deleyiannis, F. W.-B., Siegel, M. I., Marazita, M. L., and Burrows, A. M. Comparative microanatomy of the orbicularis oris muscle between chimpanzees and humans: evolutionary divergence of lip function. Reproduced with permission from John Wiley and Sons. Copyright 2008.
3. Inter-speaker variation in extrinsic tongue muscles orientation
In this section a 2D biomechanical model of the tongue is used to assess the impact of inter-speaker variations in head and neck morphology on the tongue muscle fibers’ directions and on the patterns of articulatory and acoustic variability in the production of the high front vowel /i/. Vowel /i/ has been chosen for this evaluation because its production requires precise tongue positioning.
We used the 2D biomechanical model of the tongue developed by Payan and Perrier (1997) in its most recent version (Perrier et al., 2003). It mainly ← 230 | 231 → consists of a deformable Finite Element Mesh (FEM) embedded in rigid vocal tract walls in the mid-sagittal plane. The 2D mesh is a simplified representation of the 3D tongue structure. It is considered to be a projection of the tongue in the mid-sagittal plane. The geometry of the mesh (see Figure 3) is specifically designed to facilitate the anatomical representation of the muscles acting on the position and shape of the tongue in the front-back direction. The external contour of the mesh was derived from an X-ray view of the vocal tract of a male speaker at rest (close to a schwa production). Five muscles are represented: the Genioglossus, the Styloglossus, the Hyoglossus, the Verticalis and the Longitudinalis Inferior. The Genioglossus has been divided in two functional parts, the Posterior and the Anterior Genioglossus. Muscle activations are controlled according to the λ-model (Feldman, 1986), which generates a force for each muscle that is a function of the difference between the motor control variable λ specified for this muscle and the actual muscle length. If the actual length is smaller than λ no active muscle force is generated. If the actual muscle length is equal to or larger than λ the force develops as an increasing function of the actual muscle length. In sum, in a given static position of the tongue, in which a muscle M has the length l, the force FM generated by the muscle varies with the motor control variable λ according to the equation:
where c is a form parameter and ρ is an amplitude parameter directly related to the force generation capacity of the muscle (for more details see: Laboissière et al., 1996; Payan and Perrier, 1997). λ can be seen as the threshold muscle length above which muscle force starts developing. In spite of its simplicity this 2D biomechanical model has been shown to be capable of accounting for some important kinematic characteristics of speech articulation, which have been experimentally observed in different speakers of different languages: velocity profiles (Payan and Perrier, 1997), trajectory shapes (Perrier et al., 2003), or relations between trajectory curvature and speed (Perrier and Fuchs, 2008).
This model serves as a reference model, from which speaker-specific 2D biomechanical models can be routinely developed according to the method proposed by Winkler et al. (2011b). Two basic hypotheses underlie the adaptation of the model to a specific speaker: (1) the general anatomical ← 231 | 232 → arrangements accounted for by the mesh geometry in the reference model is common to all human beings, (2) variations across speakers in muscle lengths and muscle orientations within the tongue are strongly correlated with global variations of the head and neck morphology, such as variations in larynx height, length of the mandible ramus, head size, and mid-sagittal palate shape. Taking these assumptions into account, the transformation of the reference model requires contours reflecting the vocal tract morphology and anatomical landmarks corresponding to muscle fiber origins. The two contours are the tongue contour at rest, and the mid-sagittal external contour including the upper lip, the palate, the soft palate and the pharyngeal walls. The three landmarks are the lower (P1) and upper (P2) limit of the tongue where the Genioglossus emanates from the mandibular Symphysis, and the Styloid process (P3) (see Figure 3 for a representation of these landmarks on the reference model).
Once these anatomical landmarks are determined on the speaker (see below for details), the generation of the speaker-specific biomechanical model is straightforward. First, the upper contour of the tongue model is projected onto the mid-sagittal tongue contour measured for the subject. Second, the distribution of the nodes along this new upper contour is made proportionally to the distribution of the nodes in the reference model. Third, the lower and upper attachment points of the new tongue mesh on the mandible are assigned to points P1 and P2. Then, the distribution of the nodes within the mesh is obtained by deforming the original mesh linearly from the nodes on the upper contour to the insertion nodes P1 and P2 of the mesh into the mandibular Symphysis. The difference in size and orientation of the segment [P1 P2] between the reference model and the speaker-specific morphology serves to transform the size and the orientation of the incisor representing the mandible in the sagittal plane. Finally, the extremity of the external Styloglossus fiber is attached to point P3. This matching procedure fully determines the geometry of the new mesh and consequently the muscle arrangement within the speaker-specific tongue model. It preserves the original topology of the mesh while accounting for the speaker-specific morphology. ← 232 | 233 →
Figure 3: The reference 2D biomechanical model of the vocal tract. The tongue and jaw position correspond to the positions observed at rest for the reference subject with X-Ray imaging from the side. The anatomical landmarks that serve as a basis for the transformation of this model into a speaker-specific model are indicated.
Just as in the reference model, the speaker-specific tongue mesh obtained after the adaptation procedure represents the tongue at rest. The lengths of the muscles in this rest configuration determine speaker-specific reference λ commands. For all speakers, if the λ commands are equal to the reference values, the force generated by each muscle in the rest configuration is equal to zero. These reference λ commands are used to establish the correspondence between the motor commands used in the speaker-specific models and in the reference model: we consider the motor commands to be equal in all the models if the difference δλ between the actual λ values and the speaker-specific reference λ commands are equal.
In order to generate the acoustic signal associated with a given vocal-tract configuration, the 2D mid-sagittal representation of the vocal tract has to be converted to its corresponding area function. This is accomplished first by determining the variation of the mid-sagittal distance from the glottis to the lips. Then, the area function is computed from the sagittal distance by applying an enhanced version of the model proposed by Perrier et al. (1992). The mid-sagittal distance is measured on a grid that is projected on the geometry of the biomechanical model. For the speaker-specific model a grid derived from the grid proposed in Perrier et al. (1992) is used. The ← 233 | 234 → grid is divided into a pharyngeal section from the glottis to the velum, and a palatal section from the velum to the lips. The interval between the lines of the grid and the angle between the pharyngeal and the palatal part have been adapted in order to match the length and angle characteristics of the speaker to whom the model is adapted. Then, the exact same procedure was applied for all the models to compute the area function from the sagittal distance. Doing so, we do not account for inter-speaker differences in the transversal direction, i.e. the direction orthogonal to the mid-sagittal plane. This choice is justified by the fact that we want to only assess inter-speaker differences associated with the biomechanical specificities accounted for in the model.
Finally, the acoustic signal is generated from the area with a reflection-type line analog of the vocal tract (Story et al., 2000). Vocal folds oscillations are generated and controlled with a numerical implementation of the three-mass model designed by Story and Titze (1995) based on lumped-elements (Titze and Story, 2002).
In order to illustrate with this procedure the potential impact of speaker-specific biomechanics on speech production, we have focused on the production of vowel /i/. Vowel /i/ is interesting for three main reasons: (1) it is an extreme vowel that exists in all the languages of the world (Ladefoged and Maddieson, 1996); (2) the correct acoustic realization of this vowel requires a precise position of the tongue along the palate (Gay et al., 1992); and (3) the articulation of this vowel requires mainly the activation of the posterior Genioglossus and the Styloglossus (see for example Buchaillard et al., 2009), two muscles that are likely to be significantly impacted by the variation of the head and neck morphology across speakers. We have focused on the variation in articulation and in acoustics associated with local variations of the activations of the Posterior Genioglossus, the Styloglossus, the Anterior Genioglossus and the Hyoglossus. These muscles have been shown to be the most important for tongue position control in vowel production (Honda, 1996).
We first determined for each model a tongue configuration corresponding to a prototypical /i/. This prototype was obtained in two steps. First, 1000 tongue configurations were generated by a random sampling of ← 234 | 235 → the 6-dimensional space of the motor control variables (the λ-space), expressed in terms of their differences with the reference λ commands in the tongue rest position. Among these 1000 configurations one configuration was selected for which the formant patterns and the sagittal view of the model corresponded to the vowel /i/. Second, starting from this /i/ configuration, we adjusted step-by-step the λ values of the Posterior Genioglossus muscle and the Styloglossus muscle in order to improve the characteristics of the vowel /i/. The criteria are that a prototypical /i/ is characterized by a narrow constriction in the alveolar region and by the highest possible value of the second formant. For each model our standard /i/ configuration had these two basic characteristics. For each model different articulatory configurations were generated around the corresponding prototypical /i/ configuration, by changing the motor control variables to the Posterior Genioglossus, the Styloglossus, the Anterior Genioglossus and the Hyoglossus within a range of variation of δλ, the difference between the actual λ values and the reference λ values commands at rest, equal to [-2 mm +2 mm] with a 1-mm-step. Thus, five different λ values have been used for each muscle and all the combinations of the λ values of the four muscles were used (54 = 625 articulatory configurations). Finally the variation in the sagittal plane and in the acoustic domain was assessed and compared across speakers.
This methodology was applied to two speakers, a female speaker S1 and a male speaker S2. These two speakers were selected from a set of 13 subjects for whom MRI anatomical data were available (Apostol, 2001), because they are quite representative of the vocal tract differences between female and male. The results of the simulations obtained for these two models and for the reference model are presented and analyzed. ← 235 | 236 →
Figure 4: Adapted 2D biomechanical models of the vocal tract (left panels, lips are on the left hand side) and mid-sagittal MR images of the vocal tract at tongue rest position (right panels), for subjects S1 (top) and S2 (bottom). For comparison with the Reference Model see Figure 3.
The geometrical transformation of the reference model into the speaker-specific model induces changes in the direction of the muscle fibers. In Figure 5, we can observe these changes for the two main muscles involved in the production of vowel /i/, the Posterior Genioglossus (left) and the Styloglossus (right). For the Posterior Genioglossus few differences are observed between S1 and the reference model; for S2 the lower fibers of this muscle are more inclined than in the reference model. For the Styloglossus muscle the two speakers S1 and S2 present external fibers that are clearly ← 236 | 237 → more vertical than in the reference model. This phenomenon is stronger for S1 than for S2. Accordingly we expect the Styloglossus muscle to generate movements of the tongue that are more vertical and less horizontal in S2 than in S1 as well as than in the reference model.
Figure 5: Fibers’ implementation of the Posterior Genioglossus (left panels) and of the Styloglossus (right panels) in the 2D biomechanical models of the vocal tract. The circles on the edges of the elements of the mesh describe the path of the fibers in the mesh. The solid lines joining the Styloid process (circle on the upper right corner of each panel) represent the fibers that are external to the tongue. The crossed elements in the mesh correspond to the muscle body. Their stiffness increases when the muscle is activated. From the top to the bottom: reference model; Subject S1; Subject S2. ← 237 | 238 →
In Figure 6, the vocal tract configuration selected for vowel /i/ is represented for each model. All of them have a constriction in the alveolar region, but the length of the constriction along the front/back direction varies across the models, due to differences in tongue shapes and in palatal contours. Speaker S1 seems to have a longer constriction than speaker S2, and the reference model. This is confirmed by the computation of the area functions (see Figure 7). The force produced by each muscle in this configuration was computed as described in equation 1. For the reference model the ratio between the force exerted by the Posterior Genioglossus and the one exerted by the Styloglossus is equal to 0.75. The same ratio is found for S1, but this ratio is equal to 1.9 for S2. This difference is consistent with the fact that at rest the tongue and jaw are lower, i.e. the tongue is further apart from the occlusal plane for S2 than for S1 and the reference model (see Figures 3 and 4).
Figure 6: Mid-sagittal views of vowel /i/ generated by the speaker-specific 2D biomechanical models and the Reference Model of the tongue (see Figure 5). The dotted lines show rough estimations of the constriction’s boundaries. Top panel: Reference Model. Bottom panels: Left, subject S1; Right: subject S2. Lips are on the left hand side. ← 238 | 239 →
Figure 7: Variations of the articulatory configuration observed for vowel /i/ when the activation of the Styloglossus varies. The left panels show tongue positions in the mid-sagittal plane, in the region of the constriction (lips are on the left hand side). The right panels show the variation of the area function in the region of the constriction (front is on the right hand side). The dotted arrows superimposed on the plots of the area functions give the main directions of the area changes in the constriction’s region. The size of the arrows corresponds to the amplitude of the area change. From the top to the bottom: Reference Model; Subject S1; Subject S2. ← 239 | 240 →
Figure 7 presents the results of the random variation of the motor control variables λ to the four main muscles, according to the methodology described above. The left panels represent the tongue contour variations in the mid-sagittal plane with a focus on the palatal region. The right panels represent the corresponding variations of the area function, focusing on the region of the constriction. The main direction of the tongue contour variations changes across the models. For the reference model, the variation in the mid-sagittal plane is essentially along a front/high-back/low direction. This is associated with a change in the constriction opening. For speaker S1, the variation in the mid-sagittal plane is two-fold: the variation of the opening of the constriction in the alveolar region is significantly larger than the variation of the constriction in the post-alveolar region. The consequence for the area function is that the opening/closing of the front part of the constriction is associated with a relative closing/opening of the back part of the constriction. Thus, variations in the main muscle activations are associated with a change in the main constriction location. For speaker S2, the pattern of variation is intermediary between speaker S1 and the reference model. The main trend is a global opening of the constriction, but the narrowest part of the constriction moves backwards. A detailed analysis of the effects of the four different muscles taken separately has revealed that these patterns of variation are mainly associated with the action of the Styloglossus muscle. This statement is consistent with the observations of the differences existing across speakers in the direction of the external fibers of the Styloglossus, as observed on Figure 5: for speaker S1, the orientation of these fibers is more vertical than for S2 and the reference model, and the vertical component associated with changes in muscle activations is stronger. The increase in Styloglossus activation creates for speaker S1 a constriction just behind the original place of constriction. A similar trend exists also in S2 as compared to the reference model but it is smaller.
The acoustic variations associated with the articulatory variations shown in Figure 7 are depicted in Figure 8. Note that the scaling of the figures is the same for the three models. The differences in the main ← 240 | 241 → orientations of the dispersion ellipses across models inform about the main impact in the acoustical domain of the biomechanical differences.
For S1 the variability along the F3 dimension is clearly stronger and the variability along the F1 direction clearly smaller than for S2 and, to a lesser extent, the reference model. This is consistent with the fact that in S1 the constriction location moves along the front/back direction due to the orientation of the force exerted by the Styloglossus relatively to the palatal contour, while its cross-section changes less than for the other two models. S2 has the largest variability in the (F2, F1) space and the smallest variability along the F3 dimension. The reference model is intermediary. For a correct perception of vowel /i/, reaching a high F3 value is important (see for example Schwartz et al., 1993). Hence, these simulations suggest that model S1 requires a more accurate control of the Styloglossus muscle activation than the remaining two models.
Obviously, for a comprehensive analysis, the influences of the other muscles should be taken in consideration. We can also not discard the possibility that our observations are linked with the special standard configuration chosen for vowel /i/, even if the results are very consistent with the differences observed in the Styloglossus fibers’ orientation across speakers. It is not possible to draw strong conclusions from this limited study. Our results just aim to illustrate how speaker-specific biomechanics can influence motor control strategies and could explain in part some trends in idiosyncrasies. ← 241 | 242 →
Figure 8: Variability in the (F2, F1) (left panels) and (F2, F3) planes (right panels) associated with local variations of the four main muscle activations for vowel /i/. The ellipses represent the 2 σ dispersion ellipses inferred from the data dispersion assuming a normal distribution. From the top to the bottom: Reference model; Subject S1; Subject S2. ← 242 | 243 →
4. A modeling study of anatomical variability in Orbicularis Oris
In articulatory phonetics lip protrusion is considered to be the basic gesture underlying the production of rounded vowels such as /u/ or /o/. The acoustic characteristics of rounded vowels as compared to unrounded or spread vowels are well-described and consistent in many languages. They correspond to an increase of the spectral energy in the low frequencies and a decrease in the high frequencies. However, the actual gesture underlying the production of rounded vowels can significantly vary across speakers. For a large part of the speakers the lips are protruded to the front and the lip orifice has a small area and is round. For another part of the speakers the lips are not protruded; the lip orifice has a small area but it is not round. Stavness et al. (2013) provided two characteristic examples of these two different articulatory strategies (see their Figures 1 and 2, p. 879). Stavness et al. (2013) have investigated the potential contribution of anatomical variability in the distribution of the muscle fibers between the Peripheralis and the Marginalis parts of the Orbicularis Oris. Facial muscles present a non-negligible variability across humans. Stavness et al. (2013) cited for example the studies of Huber (1933) who found that the Risorius muscle (see Figure 1) exists in only 20% of the Melanesians and in 80% of the Europeans. They also referred to Pessa et al. (1998) who observed that among the 50 specimens that they studied, 17 presented a Zygomaticus Major muscle with a bifid structure, i.e. with two insertion points on the skull. This peculiarity could be responsible for the dimple in the cheeks that many people have when smiling. To our knowledge no study has shown that significant differences exist among humans in the morphology of the Orbicularis Oris muscle. Nevertheless, since the emergence of distinct Marginalis and Peripheralis parts in this muscle seem to be quite recent in the primates’ development (Rogers et al., 2009), it is not unlikely that a variability exists. Citing Ladefoged (1984), Stavness et al. (2013) suggest that such variability would be consistent with the fact that individual differences in facial mimics are compatible with individual differences in lip shaping during speech production.
The investigation was based on simulations run with a sophisticated 3D Finite Element biomechanical model of the face (Nazari et al., 2010, 2011; ← 243 | 244 → Stavness et al., 2014). This model includes a 3D anatomical representation of all the muscles that are mentioned in section 1 and displayed in Figure 2. Muscle mechanics is accounted for with a Finite Element model of the Hill-type muscle model (Blemker et al., 2005). Details about the parameterization of the muscle model can be found in Stavness et al. (2013). The Orbicularis Oris muscle is represented as a continuous loop of elements around the labial orifice as depicted in Figure 9.
In order to evaluate the influence of the anatomical variability in this muscle, Stavness and colleagues performed two different sets of simulations:
1. To evaluate the influence of the depth of the muscle implementation they considered simulations with active elements located only in the deep (D), or in the middle (M), or in the superficial (S) layer of the mesh (see Figure 9, bottom right panel)
2. To evaluate the influence of the size of the muscle implementation they considered simulations with active elements of various sizes, from marginal to peripheral (1, 2, 3, 4 on Figure 9, bottom right panel).
Simulations were performed in ArtiSynth (http://artisynth.magic.ubc.ca/artisynth/), which is a 3D platform for fast-forward dynamics simulation with dynamic coupling between rigid body and soft Finite Element models as well as collision handling. Each simulation was 500 ms in duration. Muscle activation increased linearly up to 400 ms and held the final activation for 100 ms. In all simulations, muscle activation was increased uniformly from 0% to 50% of the maximum possible activation, which corresponds to an active muscle stress of 50 kPa. This level of final activation was chosen to ensure numerical convergence in all simulations while generating lip displacements of realistic amplitudes. Each simulation reached an equilibrium position by 500 ms. ← 244 | 245 →
Figure 9: Front (left panels) and side (right panels) views of the face model showing the Orbicularis Oris muscle elements organized into different peripheral loops from marginal to peripheral (1, 2, 3, 4), and into different depth layers in the mesh, superficial (S), middle (M), and deep (D). Reprinted from Journal of Speech, Language, and Hearing Research, 56(3), 878–890, Stavness, I., Nazari, M. A., Perrier, P., Demolin, D., and Payan, Y. A biomechanical modeling study of the effects of the orbicularis oris muscle and jaw posture on lip shape. Reproduced with permission from the American Speech-Language-Hearing Association (http://jslhr.pubs.asha.org). Copyright 2013.
The results of the different simulations are summarized in Figure 10. Each row represents how lips shape varies, for a given deepness, when the implementation changes from marginal to peripheral. Each column shows the influence of deepness, from superficial to deep, for a given peripheralness. In each panel a front view of the lip horn is presented on the left and a side view is presented on the right. Protrusion can be seen on the side view. The area and the shape of the lip orifice can be seen on the front view. We can see that large lip shape variability is associated with ← 245 | 246 → the implemented anatomical variability. A deep implementation tends to reduce the vertical dimension of the labial orifice. This is probably due to the fact that the contraction of the deep part of the muscle generates an inward displacement of the whole labial tissue, while a superficial implementation only acts on the superficial labial tissues. More peripheral implementations are associated with larger lip protrusion. This can be explained by the combination of the effects of the teeth, as a rigid obstacle, and the quasi-incompressibility of the labial tissues. Labial tissues tend to maintain their global volume quasi constant whatever the stress applied to them. When the labial tissues are compressed in the region close to the teeth (the more peripheral one), the compensatory expansion of the volume in the other parts of the tissues can only occur in the front direction, since the teeth block the expansion in the back direction. For the other degree of peripheralness, the volume expansion can occur in both directions. Interestingly, deepness influences the impact of the degree of peripheralness on lip aperture: for a superficial implementation, lip aperture varies monotonously with the increase of peripheralness; for a middle implementation the aperture increases from peripheralness degree 1 to degree 3, and decreases from degree 3 to degree 4; for a deep implementation the peripheralness has little impact on the very small lip aperture. The most prototypical lip shape corresponding to a French rounded vowel like /u/ or /y/ is only observed for a middle deepness and a middle peripheral (levels 3 and 4) implementation of the Orbicularis Oris.
This set of simulations illustrates how individual variation in Orbicularis Oris muscle anatomy could influence the gesture underlying the production of rounded vowels. In subjects having a quite marginal Orbicularis Oris implementation, it seems more difficult to generate a protrusion of the lips and to achieve a small round lip orifice. Again biomechanics determines the constraints applied to the achievement of a gesture, and the central nervous system can elaborate different motor control strategies to deal with these constraints. Hence, it is possible that a lip protrusion and a round lip orifice are achieved in spite of a marginal implementation of the Orbicularis Oris. However, such marginal implementation could make these gestures more complex, with the consequence that they would be observed less often than in subjects with a more peripheral implementation. ← 246 | 247 →
Figure 10: Simulation results for different Orbicularis Oris muscle deepness and peripheralness. Reprinted from Journal of Speech, Language, and Hearing Research, 56(3), 878–890, Stavness, I., Nazari, M. A., Perrier, P., Demolin, D., Payan, Y. A biomechanical modeling study of the effects of the orbicularis oris muscle and jaw posture on lip shape. Reproduced with permission from the American Speech-Language-Hearing Association (http://jslhr.pubs.asha.org). Copyright 2013.
The rare studies of the influence of individual biomechanical factors on subject-specific motor control strategies in very skilled motor tasks have shown that this influence is limited. Frère and Hug (2012) have studied nine high level gymnasts with different morphologies during backward giant swings in the high bar. They have computed the correlations between their different muscle activities in order to extract synergies, independently for each gymnast. They found that the nine subjects share the same first two main synergies. Differences started to be significant only from the third most important synergy. A similar observation was done by Hug et al. (2010), who studied muscles synergies in eleven highly trained cyclists in an experimental protocol in which the torque they had to counteract, the torque-velocity relation, and their posture varied significantly. These authors found that the three first synergies remain the same across conditions. Since speech production is also a highly skilled motor task we expect similar findings. We believe that the most known synergies observed in speech production are certainly shared by the huge majority of humans. ← 247 | 248 → We believe that biomechanical influences are more subtle and affect the balance within the synergies, and the sensitivity of the articulatory configuration to small variations in muscle activation. Thus, speaker-specific biomechanical properties can influence the level of accuracy required for the production of given sounds.
With simulations performed with two kinds of biomechanical models of the orofacial motor system, we have shown examples of the potential influences of speaker-specific biomechanics on the production of speech gestures. These examples show how inter-speaker differences in muscle anatomy can generate inter-speaker differences in motor control strategies or/and in articulatory and acoustic variability. Work is currently in progress in our lab to assess how these phenomena could influence coarticulation strategies. Coarticulation strategies determine the way gestures are organized, sequentially and in parallel, for the production of a speech sequence. Coarticulation strategies use the degrees of freedom of the speech production system to optimize the gestures while preserving the ultimate goal of speech production – its correct perception by listeners (Whalen, 1990; Lindblom, 1990). The example of the speaker-dependent impact of variations in the muscle activations around vowel /i/ (section 3) on articulatory and acoustic variability has shown how biomechanics can change the degrees of freedom and the accuracy in the achievement of a given speech task. The study of the impact of the Orbicularis Oris implementation on the production of rounded vowels (section 4) suggests that biomechanics can change the motor control strategies underlying the production of speech. With these two limited examples we do not pretend to cover all the ranges of the potential influences of biomechanics on speech motor control. We have shown that orofacial biomechanics can influence the emergence of motor strategies in speech production, due to the fact that it affects the degrees of freedom and the accuracy of the control. Coarticulatory variability results to a large extent from the use of the degrees of freedom to anticipate forthcoming gestures and reduce speech effort, while preserving a satisfactory accuracy to enable a good perception of the speech signal. Hence, it is likely that idiosyncrasies originate in part in speaker-specific biomechanical factors.
This could have an influence not only on speech production, but also on speech perception. It has been shown that an interaction exists between the motor control underlying the production of the sounds and perceptual ← 248 | 249 → boundaries for these sounds. For example Shiller et al. (2009) have perturbed the auditory feedback of speakers during the production of the fricative /s/, in order to make it sound more like a //. To do so they shifted the spectral energy toward the low frequencies. They observed that the subjects tend to correct their articulation in order for the corrected articulation to generate a perceived sound that is closer to their usual /s/, in spite of the perturbation of the auditory feedback. The subjects produced a more anterior articulation of /s/, in order for the spectral energy to move back to the high frequencies. Interestingly a perceptual test run immediately after this experiment has shown that the perceptual boundary between /s/ and // has moved: the subjects tolerate more low frequencies for /s/ than before the experiment. This result suggests that in presence of the perturbed auditory feedback, due to the influence of the usual articulation of /s/ and //, the subjects have limited the articulatory changes and accepted a small shift in their perceptual boundaries. Since motor control seems to influence perceptual classes, we expect that the articulatory variability compatible with a correct perception of a sound could influence the tolerated perceptual variability. Thus, we can imagine a scenario in which idiosyncrasies would emerge from the interaction between biomechanical constraints, perceptual accuracy and social and cultural influences.
This work was supported by the German Research Council to the SPEECHart project (Grant Nr. FU 791/1-1)
Apostol, L. (2001). Étude et simulation des caractéristiques individuelles des locuteurs par modélisation du processus de production de la parole. Unpublished Doctoral dissertation, Grenoble: Institut National Polytechnique de Grenoble.
Browman, C.P., and Goldstein, L. (1985). Dynamic modeling of phonetic structure. In V. Fromkin (ed.). Phonetic Linguistics (pp. 35–53). New York: Academic Press.
Buchaillard, S., Perrier, P., and Payan, Y. (2009). A biomechanical model of cardinal vowel production: Muscle activations and the impact of gravity on tongue positioning. The Journal of the Acoustical Society of America, 126(4), 2033–2051.
Cos, I., Bélanger, N., and Cisek, P. (2011). The influence of predicted arm biomechanics on decision making. Journal of Neurophysiology, 105(6), 3022–3033.
Foulkes, P., and Docherty, G. (2006). The social life of phonetics and phonology. Journal of Phonetics, 34(4), 409–438.
Feldman, A. G. (1986). Once more on the equilibrium-point hypothesis (λ model) for motor control. Journal of Motor Behavior, 18(1), 17–54.
Flanagan, J. R., Ostry, D. J., and Feldman, A. G. (1990). Control of human jaw and multi-joint arm movements. In G.E. Hammond (Ed.), Cerebral Control of Speech and Limb Movements (pp. 29–58), North-Holland: Elsevier Science Publishers B.V.
Frère, J., and Hug, F. (2012). Between-subject variability of muscle synergies during a complex motor skill. Frontiers in Computational Neuroscience, 6, 99.
Fitch, W. T., and Giedd, J. (1999). Morphology and development of the human vocal tract: A study using magnetic resonance imaging. The Journal of the Acoustical Society of America, 106(3), 1511–1522.
Franklin, D. W., Liaw, G., Milner, T. E., Osu, R., Burdet, E., and Kawato, M. (2007). Endpoint stiffness of the arm is directionally tuned to instability in the environment. The Journal of Neuroscience, 27(29), 7705–7716.
Fuchs, S. and Perrier, P. (2005). On the complex nature of speech kinematics. ZAS Papers in Linguistics, 42, 137–165.
Fuchs, S., Winkler, R., and Perrier, P. (2008). Do speakers’ vocal tract geometries shape their articulatory vowel space? In Proceedings of ISSP 2008 – 8th International Seminar on Speech Production, pp. 333–336, Univ. Strasbourg, France.
Fuchs, S., Toda, M., and Żygis, M. (2010). Do differences in male versus female /s/ reflect biological or sociophonetic factors? In Fuchs, S., ← 250 | 251 → Toda, M., and Żygis, M. (Eds.), Turbulent sounds: An interdisciplinary guide (pp.281–302), Walter de Gruyter.
Fuchs, S., Perrier, P., and Hartinger, M. (2011). A critical evaluation of gestural stiffness estimations in speech production based on a linear second-order model. Journal of Speech, Language, and Hearing Research, 54(4), 1067–1076.
Gay, T., Boë, L.-J., and Perrier, P. (1992). Acoustic and perceptual effects of changes in vocal tract constrictions for vowels. The Journal of the Acoustical Society of America, 92(3), 1301–1309.
Gray, H. (1918). Anatomy of the human body. Philadelphia: Lea and Febiger, in Bartleby.com, 2000.
Gribble, P. L., and Ostry, D. J. (1996). Origins of the power law relation between movement velocity and curvature: modeling the effects of muscle mechanics and limb dynamics. Journal of Neurophysiology, 76(5), 2853–2860.
Harshman, R., Ladefoged, P., and Goldstein, L. (1977). Factor analysis of tongue shapes. The Journal of the Acoustical Society of America, 62(3), 693–707.
Hazen, K. (2002). Identity and language variation in a rural community. Language, 78(2), 240–257.
Hill, A.V. (1938). The heat of shortening and the dynamic constants of muscle. Proceedings of the Royal Society B: Biological Sciences, 126,136–195.
Honda, K. (1996). Organization of tongue articulation for vowels. Journal of Phonetics, 24(1), 39–52.
Hug, F., Turpin, N. A., Guével, A., and Dorel, S. (2010). Is interindividual variability of EMG patterns in trained cyclists related to different muscle synergies? Journal of Applied Physiology, 108(6), 1727–1736.
Jackson, M. T. (1988). Analysis of tongue positions: Language‐specific and cross‐linguistic models. The Journal of the Acoustical Society of America, 84(1), 124–143.
Kelso, J. S., Vatikiotis-Bateson, E., Saltzman, E. L., and Kay, B. (1985). A qualitative dynamic analysis of reiterant speech production: Phase portraits, kinematics, and dynamic modeling. The Journal of the Acoustical Society of America, 77(1), 266–280. ← 251 | 252 →
Laboissière, R., Ostry, D. J., and Feldman, A. G. (1996). The control of multi-muscle systems: human jaw and hyoid movements. Biological Cybernetics, 74(4), 373–384.
Ladefoged, P. (1984). Out of chaos comes order: Physical, biological, and structural patterns in phonetics. Proceedings of the 10th International Congress of Phonetic Sciences, pp. 83–95.
Ladefoged, P., and Maddieson, I. (1996). The sounds of the world’s languages. Oxford: Blackwell.
Lindblom, B. (1990). Explaining phonetic variation: A sketch of the H&H theory. In Hardcastle, W. J., and Marchal, A. (Eds.), Speech production and speech modelling (pp. 403–439). Springer, The Netherlands.
Mooshammer, C., Hoole, P., and Kühnert, B. (1995). On loops. Journal of Phonetics, 23(1), 3–21.
Munson, B., and Babel, M. (2007). Loose lips and silver tongues, or, projecting sexual orientation through speech. Language and Linguistics Compass, 1(5), 416–449.
Nazari, M. A., Perrier, P., Chabanas, M., and Payan, Y. (2010). Simulation of dynamic orofacial movements using a constitutive law varying with muscle activation. Computer Methods in Biomechanics and Biomedical Engineering, 13(4), 469–482.
Nazari, M. A., Perrier, P., Chabanas, M., and Payan, Y. (2011). Shaping by stiffening: a modeling study for lips. Motor Control, 15(1), 141–168.
Payan, Y., and Perrier, P. (1997). Synthesis of VV sequences with a 2D biomechanical tongue model controlled by the Equilibrium Point Hypothesis. Speech Communication, 22(2), 185–205.
Perrier, P., Boë, L.-J., and Sock, R. (1992). Vocal tract area function estimation from midsagittal dimensions with CT scans and a vocal tract cast: Modeling the transition with two sets of coefficients. Journal of Speech and Hearing Research, 35, 53–67.
Perrier, P., Lœvenbruck, H., and Payan, Y. (1996). Control of tongue movements in speech: The equilibrium point hypothesis perspective. Journal of Phonetics, 24(1), 53–75.
Perrier P., Perkell J. S., Payan Y., Zandipour M., Guenther F., and Khaligi A. (2000). Degrees of freedom of tongue movements in speech may be constrained by biomechanics. In Proceedings of the 6th International ← 252 | 253 → Conference on Spoken Language and Processing (ICSLP). (Vol 2., pp. 162–165). Beijing, China.
Perrier, P., Payan, Y., Zandipour, M., and Perkell, J. (2003). Influences of tongue biomechanics on speech movements during the production of velar stop consonants: A modeling study. The Journal of the Acoustical Society of America, 114(3), 1582–1599.
Perrier, P., and Fuchs, S. (2008). Speed–curvature relations in speech production challenge the 1/3 power law. Journal of Neurophysiology, 100(3), 1171–1183.
Rogers, C. R., Mooney, M. P., Smith, T. D., Weinberg, S. M., Waller, B. M., Parr, L. A., Docherty, B. A., Bonar, C. J., Reinholt, L. E., Dleyiannis, F. W.-B., Siegel, M. I., Marazita, M. L., and Burrows, A. M. (2009). Comparative microanatomy of the orbicularis oris muscle between chimpanzees and humans: evolutionary divergence of lip function. Journal of Anatomy, 214(1), 36–44.
Schwartz, J.-L., Beautemps, D., Abry, C., and Escudier, P. (1993). Inter-individual and cross-linguistic strategies for the production of the [i] vs. [y] contrast. Journal of Phonetics, 21, 411–425.
Shiller, D. M., Sato, M., Gracco, V. L., and Baum, S. R. (2009). Perceptual recalibration of speech sounds following speech motor learning. The Journal of the Acoustical Society of America, 125(2), 1103–1113.
Stavness, I., Gick, B., Derrick, D., and Fels, S. (2012). Biomechanical modeling of English /r/ variants. The Journal of the Acoustical Society of America, 131(5), EL355–EL360.
Stavness, I., Nazari, M. A., Perrier, P., Demolin, D., and Payan, Y. (2013). A biomechanical modeling study of the effects of the orbicularis oris muscle and jaw posture on lip shape. Journal of Speech, Language, and Hearing Research, 56(3), 878–890.
Stavness, I., Nazari, M. A., Flynn, C., Perrier, P., Payan, Y., Lloyd, J. E., and Fels, S. (2014). Coupled biomechanical modeling of the face, jaw, skull, tongue, and hyoid bone. In 3D Multiscale Physiological Human (pp. 253–274). Springer London.
Story, B. H., Laukkanen, A.-M., and Titze, I. R. (2000). Acoustic impedance of an artificially lengthened and constricted vocal tract. Journal of Voice 14(4), 455–469.
Takemoto, H. (2001). Morphological analyses of the human tongue musculature for three-dimensional modeling. Journal of Speech, Language, and Hearing Research, 44(1), 95–107.
Titze, I. R. and Story, B. H. (2002). Rules for controlling low-dimensional vocal fold models with muscle activation. The Journal of the Acoustical Society of America, 112, 1064–1076.
Whalen, D. H. (1990). Coarticulation is largely planned. Journal of Phonetics, 18, 3–35.
Viviani, P., and Stucchi, N. (1992). Biological movements look uniform: evidence of motor-perceptual interactions. Journal of Experimental Psychology: Human Perception and Performance, 18(3), 603–623.
Winkler, R., Fuchs, S., Perrier, P., and Tiede, M. (2011a). Speaker-specific biomechanical models: From acoustic variability via articulatory variability to the variability of motor commands in selected tongue muscles. In 9th International Seminar on Speech Production (ISSP 2011) (pp. 219–226). Montréal Canada.
Winkler, R., Fuchs, S., Perrier, P., and Tiede, M. (2011b). Biomechanical tongue models: An approach to studying inter-speaker variability. In 12th Annual Conference of the International Speech Communication Association (Interspeech 2011) (pp. 273–276).
Winters, D. A. (2009). Biomechanics and motor control of human movement (4th Edition). John Wiley & Sons, Inc. ← 254 | 255 →
1 A second-order system is a mechanical system which dynamics is described by a second-order differential equation with coefficients (mass, stiffness, damping) that are constant over time.
2 In children things are more complex since the action of the tongue on the palate during swallowing and perhaps speech production largely influences the final palatal shape, and because tongue movements in general contribute to the evolution of the vocal tract during vocal tract growth.