Table Of Contents
- About the author
- About the book
- This eBook can be cited
- Prosody and conceptional variation: an introduction
- 1. Methodological considerations of prosody modeling and data collection
- Data-driven prosody modeling using PaIntE
- Prosodic breaks in talk-in-interaction: phonetic forms and communicative functions in German dialogues
- Intonational phrasing in French in the light of oral corpora
- Same or different? Prosodic prominence and information structure in Yucatecan Spanish in picture-based elicitation compared to natural speech
- Different tasks for different proficiency levels? An L2 Discourse Completion Task for German learners of French
- 2. Case studies
- Prosody and conceptional variation: conversational self-repair in French
- Intonational variation in French interrogatives in reality television
- Prosodic features and situational settings: doing reading aloud in a French writing class
- Conceptional profile and prosodic marking: the representation of reported speech in different Spanish text types
- Intonational convergence in Bulgarian Judeo-Spanish spontaneous speech
- Intonation und Bedeutung. Frageintonationen im italiano regionale von Bergamo
- The impact of information and prosodic structure on the phonetic implementation of vowel length in Ligurian
Alexander M. Teixeira Kalkhoff, Maria Selig & Christine Mooshammer
The present volume gathers twelve contributions from the “Prosody and conceptional variation” panel at the Deutscher Romanistentag held in 2017 in Zürich. Our panel linked into recent advances in Romance prosody research (Prieto, Borràs-Comes & Roseano 2010–2014; Gess, Lyche & Meisenburg 2012; Frota & Prieto 2015; Feldhausen, Fliessbach & Vanrell 2018; García García & Uth 2018; Grice, Savino & Roettger 2018; Heinz & Moroni 2018; Torreira & Grice 2018; Grice, Vella & Bruggeman 2019). However, the very specific perspective of the Zürich panel was to tackle variation in prosodic design related to situational variation (i.e., variation along communicative parameters), such as the degree of familiarity between partners, the degree of their emotional involvement with one another, the degree of communicative collaboration, the degree of dialogicity, the degree of planning, and so on.
This kind of variation has mostly been approached via what might be called the spoken-written paradigm, leading to binary oppositions of “spoken or written language” and “spoken or written speech styles”. Ever since the synthesis of Peter Koch and Wulf Oesterreicher and their model of conceptional variation between the poles of communicative immediacy and distance (Koch & Oesterreicher 1985, 2011), however, we know that it is not mediality in the first place — and not merely mediality — that is at stake when analysing this kind of variation. It is much more appropriate to relate the multidimensional und scalar features of conceptional variation to communicative choices, made by both speaker and hearer, in order to differentiate function and form of their linguistic behaviour.←7 | 8→
The final justification of conceptional variation, then, is the human ability to react to contextual embedding, to adapt linguistic behaviour to communicative conditions, and to create a wide range of shared practices which can meet the multitude of communicative needs we are facing. Evidently, research on activity types (Levinson 1979), on genres or registers (Biber 1995; Biber & Conrad 2009), as well as on discourse traditions (Koch 1997; Wilhelm 2001; López-Serena 2011; Winter-Froemel et al. 2015), tackles this kind of variation. Thus, when postulating that conceptional variation is essential to prosodic analysis, we rely on both linguistic traditions to foster the idea that the situational embedding of prosodic phenomena and the variation related to it must be integrated into prosodic research designs.
We shall briefly summarise the issues of the Zürich panel. The idea that variation related to contextual embedding is essential to prosodic research was illustrated by the wide range of communicative situations taken into account by the contributions. All contributions worked with empirical data, either with “real world” corpora or data elicited via various experimental designs. Regarding the genres in our panel, we came across unplanned, polylogical interactions of more or less familiar partners (Uth; Stahnke; Antonioli & Moroni); unplanned interactions in broadcasted settings (Reinhard); guided interviews, varying as to the degree of familiarity and private/public embedding (Schweitzer; Stahnke; Gerstenberg & Kairet; Andreeva et al.); planned, monological performances in public, either direct or media-transmitted (Grutschus); and read-aloud activities in private (Gerstenberg & Kairet) or public settings (Schweitzer; Reinhard; Grutschus). We might add the internal variation of the experimental designs, including (artificially) initiated and thematically centred dialogical interaction (Peters), Discourse Completion Tasks (DCTs; Uth; Meisenburg & Gabriel), and elicited carrier sentences and word lists (Garassino & Filipponio).
At first glance, the variety of conceptional profiles at play might reflect the contingencies of actual research interests or the difficulties in obtaining reliable acoustic data. It should be pointed out, however, that conceptional variation was and is at the centre of a vivid debate on methodological and theoretical issues in prosodic research. First, there is the question of how to deal with the complex and multidimensional nature of empirical prosodic data. One can opt for elicitation in order to guarantee optimal technical requirements, sufficient amounts of data, and pragmatically controlled representatives of the respective target structure. Prosody, on the other hand, is intimately linked to a speaker’s attitudes, and variation as to the spontaneity of linguistic encounters is directly reflected in prosodic performances.←8 | 9→
One major methodological issue for prosodic research is thus appropriate elicitation methods. Controlled laboratory settings, such as picture elicitation experiments, or read speech (successful in “standardising” the subjects’ output for ease of comparison), cannot be ruled out (Face 2002; Zubizarreta & Vergnaud 2005; Skopeteas et al. 2006). Nevertheless, the elicitation of more natural speech data and research on spontaneous corpus data remain a big challenge (Gabriel 2007; Riester & Baumann 2013; Buchholz & Reich 2018; García García & Uth 2018). Interestingly, the Zürich panel showed a second way of how to deal with varying communicative contexts. Some of the contributions relied on the explicit comparison of contrasting or adjacent conceptional settings (Schweitzer; Uth; Stahnke; Reinhardt; Gerstenberg & Kairet; Grutschus). This approach seems to be promising, as it fosters the idea that there is no “normal”, decontextualised prosody, but rather varying prosodic strategies embedded in situational settings and each with its own right. Variation, in this approach, is not “deviation”, but difference; not chaotic and random, but adaptive and systematic.
This new approach to variation stresses some of the main topics in current prosodic theory formation. Prosodic variation can shed new light on the relation of phonetics to phonology, one of the most important topics in prosodic research since the earliest days of the field. The tension between what might be called a concrete approach and a theory-driven one (Ladd & Cutler 1983) shows up, for instance, when phonetic research on prosody and Intonation Phonology is confronted. Currently, the most frequently used annotation scheme in prosody is based on ToBI with the autosegmental-metrical approach (Ladd  22008; Beckman, Hirschberg & Shattuck-Hufnagel 2005). This system and its language-specific adaptation has the drawback of being basically phonological and based on the auditory impression and semantic interpretation of the annotator. More recent approaches, such as DIMA (Deutsche Intonation: Modellierung und Annotation, see Kügler et al. 2019) try to curb this problem by separating the annotation level from the functional interpretation. It integrates several acoustic correlates of prominence and phrasing such as voice quality and tonal breaks with a more fine-grained prominence marking on a separate tier. Nevertheless, DIMA — like most other annotation systems — presupposes existing categories that must be labelled somehow by the annotator.←9 | 10→
As an alternative to this top-down approach, annotation can also be based directly on the acoustical signal and/or more-or-less refined derived signals. The observed prosodic patterns can then be classified into common categories and compared to traditional annotation systems. One such bottom-up approach, using continuous wavelet transforms of the fundamental frequency, intensity, and segmental durations, was been presented at our workshop by Juraj Šimko (published as Suni et al. 2017). Using a news corpus with a priori word segmentation, this unsupervised algorithm automatically classified derived signals into prominences and phrasal demarcations that lead to hierarchical representation. Another example for a data-driven approach was developed by Reichel (2010, 2014). Based on styled f0 contours, local and superimposed global patterns were extracted and perceptually evaluated by human listeners. Automatic corpus annotation of large corpora is helpful — if not essential — for improving speech synthesis and recognition (Suni et al. 2017). One example of a semi-automatic approach is PaIntE (Parameterised Intonation Events; see Schweitzer in this volume).
Discussing the pros and cons of bottom-up or top-down strategies is not only a methodological issue. Theoretical issues are also at stake when deciding the question of whether prosodic research should rely on the interplay of all prosodic features (i.e., tonal/durational/energy level or voice quality) or narrow down to one feature which is assumed to be the essential (i.e., a phonological one), as Autosegmental-Metrical (AM) Phonology does, for example. The AM model focuses only on tonal structure and perceptually salient turning points or changes (pitch accents and boundary tones phonetically implemented by fundamental frequency) within the domain of intonational phrases, mostly at their end before major prosodic breaks. Because of the strong assumption of a direct relationship between nuclear pitch configurations and speech acts, the goal is to elicit the intonational design of a limited number of up-front defined speech acts, such as statements with different focus types, yes/no-questions, or imperatives (see the data elicitation protocol for Romance intonation in Frota & Prieto 2015).
Thus, both the data elicitation methods — such as the DCT (Discourse Completion Task) and Map Task (Prieto 2012; Vanrell, Feldhausen & Astruc 2018) — and the subsequent tonal alignment are widely pragmatically controlled and knowledge-driven. Once the limited inventory of the nuclear configurations is set, all intonational variation is subsumed according to the assumed illocutionary act (Jun 2005–2014). We might add recent developments in focus phonology, where theory-driven approaches have provided insight into the role of prosody in information structuring and focus marking within utterances (Krifka & Musan 2012; Féry 2013; Féry & Destruel Johnson 2019).←10 | 11→
The question of which way to choose, obviously, must remain open, but it is important to integrate the theoretical assumptions underlying these approaches in order to get a better understanding incrementally of the methodological differences. The question of how to deal with the empirical-theoretical tension appears yet again when phrasing and the delimitative function of prosody are at stake. It is generally agreed that all prosodic features play a major role in building up (internal) cohesion and the (external) delimitation of units of speech, and both the delimitative function, as well as the interplay of the prosodic features, have always been at the centre of prosodic interest. What remains controversial, however, is the exact place of syntax in prosodic phrasing. Interface models of syntax–prosody mapping (Selkirk 1984) underline the narrow dependency of prosody on syntactic categories. On the other hand, research on corpora of spontaneous informal interactions has gathered enough evidence for much looser connections: even though there are strong mapping tendencies, no perfect one-to-one syntax–prosody mapping exists (Delais-Roussarie, Yoo & Post 2011; Kentner & Kremers 2020), and cohesion and delimitation seem not to depend exclusively on syntactical and propositional features, but also on cognitive processing factors (Anderson 72013: 275ff.), textual-rhetorical dimensions, and interactional listener-oriented cueing.
Conceptional variation, related to the degree of either planning/spontaneity or dialogicity/monologicity, may in part explain some of the issues. It is interesting, then, to contrast text- and interaction-based notions of prosodic phrasing such as “paragraphe oral” (Morel & Danon-Boileau 1998), the “intonational packaging” of TCUs (turn constructional units) (Schegloff 2007), or “syntactic gestalt” and prosody as part of the macro-syntactical (i.e., textual) planning and organisation of utterances (Auer 2009, 2010a, 2010b, 2014) with exclusively syntax-driven approaches.
This leads us to focus on the role prosody plays within interactions (Selting 1995a; Barth-Weingarten, Reber & Selting 2010; Bergmann et al. 2013; Couper-Kuhlen & Selting 2018; Kohler 2018). Interactional linguistics and interactional prosody conceptualise prosody as one of several resources of basically multimodal human communication. There has been extensive research on turn-taking organisation in conversation (i.e., the use of prosodic means to show intent to continue a turn, to yield the floor, to request confirmation, and so forth) (Walker 2010, 2013; Zhang 2012; Couper-Kuhlen & Pfänder 2019); on signalling specific entrenched communicative formats, such as lists or word searches (Günthner 2000, 2018; Selting 2007; Dressel & Teixeira Kalkhoff 2019; Dressel, Dankel & Teixeira Kalkhoff accepted in 2020); on signalling reported or imagined speech (Ehmer 2011); and on the bridging of online processing difficulties.
Here we might add also the use of prosodic means to mark evidentiality (Vanrell Bosch, Armstrong & Prieto 2014), truth value or epistemic status (Fliessbach & Reich 2014), modality (Waltereit 2006; Reich 2018), and other pragmatic categories such as illocutionary meaning or information packaging. Pragmatic categories depend strongly on partner-oriented interaction and the actual values such interaction takes on. The tension between interactionally based research and approaches embedded in formal and sentential (i.e., not textual) perspectives promises, then, to highlight some of the major problems in linguistic description and theory formation.←11 | 12→
Our edited volume consists of two sections: one assembling several more global methodological considerations of prosody modelling and respective data collection, and one presenting case studies of French, Spanish, Bulgarian Judeo-Spanish, and Italian prosodic features within different speech styles or activity types.
The first section opens with Antje Schweitzer’s reflection on a parametric and data-driven approach to intonation and its variation. Using the PaIntE model for both analysing and synthesising (i.e., predicting via machine learning and fundamental frequency contours), she advocates a quantitative, phonetic, and largely theory-independent model of intonation. Schweitzer points out six relevant measurable phonetic parameters, such as the temporal alignment and height of the f0 peak, for the PaIntE mathematical function of the fundamental frequency contour. Based on a corpus of professional radio news speakers’ data, she manually labels, according to the German ToBI categories, 19,000 pitch accents and maps them to the abovementioned continuous acoustic parameters. She points out that the internal configuration of the six PaIntE parameters as well as the placement of the accented syllable (word-internal, word-final) is highly relevant for the realisation of the prosodic categories. Interestingly, the automatic detection and recognition devices, trained with her data, obtained highly satisfying scores of correct detection only as long as comparable speech styles were being analysed. When applied to more spontaneous and more dialogical radio data, the ratio of correct identification was considerably lower. Schweitzer therefore concludes that “the phonetic implementation of prosodic categories, if not the categories themselves, is subject to great variability between speech styles.”
Benno Peters’ contribution investigates the phonetic characteristics of prosodic phrase boundaries in German spontaneous dialogic speech. These prosodic breaks facilitate the syntactic parsing and cognitive processing of the content, organise turn-taking, and indicate difficulties in the planning and execution of an utterance. On the basis of a qualitative sequential analysis of nine examples and a quantitative analysis of 4,000 prosodic breaks from a large corpus, Peters analyses the phonetic characteristics of prosodic phrase boundaries and their respective communicative functions in conversation. He shows that prosodic breaks are cued by strong pitch movements, pitch reset, segmental lengthening, pauses and breathing, and changes in voice quality and intensity. Speakers bundle these phonetic features in a flexible way to signal different functions. His findings indicate that the use and interpretation of phonetic information in intrinsic multimodal conversation is always highly context-sensitive and not an autonomous system with strict categories.←12 | 13→
Elisabeth Delais-Roussarie also addresses the question of prosodic phrasing in spoken language. Analysing oral corpus data, she tackles the nature and status of the intonational phrase (IP) compared to the intermediate phrase (ip) and the accentual phrase (AP) in spoken French. While the accentual phrase is constrained metrically (accentuation, rhythm) and the intermediate phrase by sub-clausal syntax, only the intonational phrase is clearly phonological. She shows that CP boundaries and the boundaries of so-called Comma Phrases (dislocations, parentheticals) build up an IP which is neither sensitive to metrical constraints nor subject to restructuring, but which is marked by prosodic boundaries that are most salient within an utterance. Moreover, she argues that the phonetic realisations of prosodic phrase boundaries in the spoken corpora vary widely, depending on factors such as speech rate or speech style. The actual strength of a prosodic boundary is thus not absolute but relative, reflecting the distribution of prosodic units in the utterance and the position the respective unit has among them.
Melanie Uth addresses the question of an appropriate elicitation design for semi-spontaneous picture-based experiments used to gain empirical data on prominence patterns related to information structure. She presents the outcomes of a comparative elicitation experiment for syntactic and prosodic realisations of focus in Yucatecan Spanish, which she achieved by contrasting three different elicitation designs. She shows that the already established picture task used in the first group — and, with minor revisions, in the second group as well — produced pragmatic mismatches (such as asking for information that was too obvious), thereby inappropriately influencing the speaker’s propositional attitudes. The third group, on the other hand, was confronted with a newly created picture story which explicitly motivated the question activities and requested answers; this group yielded intonational patterns much closer to structures attested in spontaneous Yucatecan Spanish.
Trudel Meisenburg and Christoph Gabriel present their experiences and analyses within the elicitation setting for syntactic and intonational interrogation patterns of German L2 French learners at three different proficiency levels (A1, A2/B1, and C1, according to the Common European Framework of References for Languages). To avoid both unspontaneous read and nearly uncontrollable spontaneous speech, they used a simplified L2-learner-oriented DCT for testing L2 proficiency and L1 transfer in L2 prosody. The students had to deal with 6 yes/no and 5 wh- question eliciting situations. As hypothesised by Meisenburg and Gabriel, the L2 learners generalise the final rise interrogative intonational pattern (H%) and the est-ce que-question marker as the result of prosodic L1 transfer and typical teaching approaches to interrogative syntax. In light of their study, they discuss the strengths and weaknesses of their L2 learner DCT.←13 | 14→
The second section of contributions to this volume begins with Johanna Stahnke’s corpus-based investigation of the intonational realisation of conversational self-repair in French. She analyses two types of self-repair: paraphrases (roughly as a modification of an uttered piece of conversation, usually combined with parenthetical, deaccentuated intonation) and corrections (roughly as a cancellation of an uttered piece of conversation, usually overaccentuated and combined with focus intonation patterns) are studied in two communicative settings, namely, political interviews and private conversations. Her underlying tenet is that the conceptional profile of the two communicative situations under scrutiny (i.e., communicative distance in political interview settings vs. communicative immediacy in private conversation settings) influences the quantity and quality of self-repair strategies. From this, the hypothesis follows that both repair strategies are more frequent in unplanned spontaneous private conversations; moreover, the latter show a tendency to overaccentuate the intonational pattern inherent to both strategies (parenthetical for paraphrases, and focussing for corrections).
Surprisingly, her empirical data do not confirm these hypotheses in a straightforward manner: In communicative distance, there are, unexpectedly, more instances of paraphrases than in communicative immediacy. Only when it comes to corrections — overall, clearly the less preferred and therefore less frequent repair type — does the expected distribution with more corrections in immediacy hold. What is more, the expected tendency of communicative immediacy to give way to expressive overaccentuation of both repair types does not appear in the analysed data. By contrast, whereas speakers in the political interview tend to overaccent in both repair patterns, in the private conversation setting, there is a clear tendency to camouflage corrections via deaccentuation strategies. The reason for this blurring of intonational differences must be related to turn-taking strategies, since intonational correction patterns are perceived as turn-taking signals. Masking corrections intonationally as paraphrases therefore allows speakers in colloquial speech to maintain the floor.←14 | 15→
Janina Reinhardt examines in her study 3,000 direct interrogatives taken from French reality television shows and their intonational patterns. The statistical evaluation shows that there are four factors that influence final tonal movement: semantic type (yes/no or wh-questions), morphosyntactic marking (wh-words, est-ce que, or subject-verb inversion), the position of the question word (wh-ex-situ vs. wh-in-situ), and speech type (colloquial-speech-like on-screen interactions vs. non-spontaneous scripted voice-overs in the versions rearranged for broadcasting). Although Reinhardt can state clear tendencies towards final rise in yes/no-questions, morphosyntactically unmarked structures, and wh-in-situ interrogatives, and final fall in wh-questions, there is no categorical link between those factors and the intonational shape (final rise or fall). In principle, falling and rising intonation are possible for all question types, even for string-identical ones, without necessarily changing the meaning. However, for the reality television setting, Reinhardt points out that the speech type seems to have a strong impact: the percentage of final rise realisation is significantly higher in the direct interaction of participants than in the scripted (i.e., conceptionally more distance-marked) voice-overs. Final rise, then, seems to be associated with direct call on addressee, whereas final fall only points to information gaps without insisting on dialogical interaction.
Annette Gerstenberg and Julie Marie Kairet compare the prosodic properties of two communicative activities, namely, reading aloud and being interviewed, in five elderly speakers of French. The underlying tenet is that so-called phonogenres — here, reading aloud in a French writing class — have their own specific phonetic-prosodic characteristics which evoke the intended situational, functional, and interactional specifics for both speaker and hearer. Gerstenberg and Kairet focus on pauses, pitch range, and intonational contours. Their analysis shows that in the reading-aloud activity, pitch floor is higher and pitch range wider, and there are more falling intonational contours than in the interviews, whereas differences in the length and frequency of pauses are random and not significant. While the high score of falling contours reflect the alignment on the written text’s punctuation, the higher voice register and the wider pitch range in the intonation contours are clear signs of the phonogenre of reading aloud in public, as evidenced by the fact that speakers systematically stage the beginning of their reading-aloud activity by shifting to a higher pitch range.
- ISBN (PDF)
- ISBN (ePUB)
- ISBN (MOBI)
- ISBN (Hardcover)
- Publication date
- 2021 (April)
- prosody prosody modeling conceptional variation data collection prosodic breaks intonational phrasing
- Berlin, Bern, Bruxelles, New York, Oxford, Warszawa, Wien, 2021. 240 pp., 75 fig. b/w, 41 tables.