Investigating Discourse and Texts

Maci, Stefania; Garofalo, Giovanni

Investigating Discourse and Texts

Corpus-Assisted Analytical Perspectives

by Stefania Maci (Volume editor) Giovanni Garofalo (Volume editor)

Linguistics

Series: Linguistic Insights, Volume 306

Summary

This volume delves into Corpus-Assisted Discourse Analysis (CADS), providing a deeper understanding of the social practices that underpin discourse creation and the recurring characteristics of their associated textual elements. Divided into two sections, the volume clarifies CADS methodologies and showcases their applications, shedding light on a broad spectrum of topics such as sentiment analysis, corpus annotation, recurrent constructions at the intersection of lexicon and syntax, as well as strategies shaping the discourse of politics, media and healthcare. Its clear style, methodological depth, and practical case studies make it suitable for academics and PhD students involved in CADS.

Excerpt

Cover
Title
Copyright
About the author
About the book
This eBook can be cited
Contents
Investigating discourse and texts through Corpus-Assisted Discourse Studies (CADS)
Part I: CADS as a methodological approach
The linguist’s role in sentiment analysis: From knowledge provider to data annotator
Hypervalues in tagsets and their impact on the automatic morphosyntactic annotation of Spanish
A proposal for the selection and prioritization of lexical bundles in specialised discourse
Between the construction and the phraseological unit: The way-construction in English and Spanish
Why why-fragments? A corpus-based constructionist analysis of their form and meaning
Part II: Case studies
The intonation of directive (in)subordinate if-clauses in American English: A corpus study
A critical discourse analysis of 2020 US presidential debates
Who’s friends with the victim? A corpus-based stylistic approach to the analysis of the TV series The Killing
The role of age in the Twitter discourse of British rappers and singers
A contrastive corpus-assisted study of the portrayal of Afghan women in the Twitter discourse during the 2021 Taliban takeover
All together now: Disentangling Beatles’ song titles in medical research articles
Notes on contributors

Giovanni Garofalo / Stefania M. Maci¹

Investigating discourse and texts through Corpus-Assisted Discourse Studies (CADS)

1. Corpus-Assisted Discourse Studies: A methodological synergy to analyse discourse and text-building strategies

In recent decades, qualitative discourse analysis has generated a substantial amount of valuable research that has contributed to our comprehension of linguistic patterns beyond individual sentences (Jaworski and Coupland 2014). This research has also shed light on how language, power and social dynamics mutually interplay through an intricate network of established social practices perpetuated by the verbal conduct of speakers. In particular, the seminal works by Fairclough (1992, 1995, 2003), van Dijk (1993, 1998) and Wodak and Meyer (2001)—to name but a few—demonstrated how discourse puts ideology and power into action and, specifically, how communication is involved in forming and upholding unequal power dynamics, disadvantage, and discrimination (Gillings et al. 2023: 5). This consolidated line of research, known as Critical Discourse Studies / Critical Discourse Analysis (CDS / CDA), openly declared its politically-committed goals, as its main purpose was not merely investigating social injustices but also redressing them, through an eclectic methodological framework that drew from various disciplines adjacent to linguistics (e.g., sociology, anthropology or social sciences). At the heart of this speculation was the Foucaultian notion of discourse, conceived as a system of statements, practices, and ideas that shape and produce knowledge within a specific historical and social context (Foucault 1969). Discourses are not just about language, but they encompass a broader network of power, knowledge, and social structures. Moreover, discourses are never neutral, seeing as how they convey power and contribute to the construction of reality and the shaping of social norms and identities. For this reason, discourses are closely knit to mechanisms of social control and exert influence over how people think, behave, and understand the world around them.

As rightly observed by Gillings, Mautner and Baker (2023: 6), CDS / CDA usually relies on meticulous perusal and interpretative analysis of individual texts. However, this approach encounters a challenge, as in-depth reading and detailed description are feasible primarily with a limited corpus, such as a few newspaper articles or a small number of transcripts of oral texts. Consequently, the real issue at the heart of CDS / CDA is the representativeness of the texts upon which qualitative speculation is carried out. Indeed, one might wonder to what extent the analyst gives in to the temptation of selecting specific textual instances precisely because they contain those ideological/sociological features that they aim to highlight. It was precisely to address these inherent limitations of CDS / CDA that Corpus Linguistics (CL) came onto the scene of discourse studies, around the 1990s of last century (Leech and Fallon 1992; Caldas-Coulthard 1993; Stubbs and Gerbig 1993; Krishnamurthy 1995; Stubbs 1997). Being essentially empirical and based on machine-readable, authentic and representative corpora, CL allows for automatic processing of a huge quantity of texts which can be enriched with various metadata, thus avoiding the risk of ‘cherry picking’ involved in the intuition-based approach of qualitative discourse analyses (Gillings et al. 2023: 36).

Among the earliest and most significant attempts to bring together CL and CDA, a special mention should be given to Paul Baker’s volume, Using Corpora in Discourse Analysis (2006), a reference book which paved the way for further research conducted primarily within the framework of the Lancaster University and focused on the ways sociopolitical domination is reproduced in discourse (Baker et al. 2008). It should be noted, however, that the label CADS (Corpus-Assisted Discourse Studies) was mostly used in the body of research that emerged in Italy through the initiatives of individual scholars (Partington et al. 2004; Morley and Bayley 2009; Partington et al. 2013) who focused on sociopolitical discourse analysis, which included unveiling significant ideological metaphors, discourse prosody and other recurring patterns in the language of political figures and institutions. As a result of the synergy between CL and CDA, CADS ‘explore discourse (i.e., language as a social practice) through examining corpora’, which ‘allows one to survey a corpus in its entirety rather than focusing only on certain texts’ (Gillings et al. 2023: 1). In addition, CADS are notably insightful as they are informed by a variety of theoretical perspectives, inherent to the methodologies underpinning them. While discussing the relevance and interdependence of the two fundamental components of CADS, Baker et al. (2008: 274) raise a series of interrelated issues and make the following key observations:

[i]‌n the combination of methods normally used by CDA and CL, […] neither CDA nor CL need be subservient to the other (as the word ‘assisted’ in CADS implies), [since] each contributes equally and distinctly to a methodological synergy. More precisely, we address the following interrelated questions.

1. What are the respective merits and limitations of methods of analysis traditionally used by CL and CDA when the focus is on issues that CDA traditionally examines?

2. What should be the nature of such a methodological synergy?

3. How can the combination in research projects, and their potential theoretical and methodological cross-pollination, benefit CDA and CL?

4. How helpful and /or justified is the distinction between what have traditionally been termed quantitative and qualitative approaches in linguistics?

As a summary of the responses to the aforementioned questions, it suffices to recall that the synergy between CL and CDA involves their combined application to gain deeper insights into language, power dynamics, and social structures. This synergy emerges when researchers combine the strengths of both approaches. On the one hand, CL offers quantitative tools for systematically analyzing extensive language datasets, revealing non-obvious patterns and frequencies in language use, thus reducing the potential bias of the analyst. On the other, CDA focuses on the qualitative interpretation of texts, disclosing ideological subtleties / power dynamics and contextualizing recurring linguistic features within social, political, and cultural settings. By so doing, ‘discourse studies (DS) can be enriched and made more rigorous by computer assistance’ (Gillings et al. 2023: 45), while the combination of quantitative and qualitative methodologies enhances the investigation of discourse, enabling scholars to understand how language both mirrors and shapes social reality, ideologies, and power dynamics.

Through the CADS methodology, it becomes possible to address questions such as (Gillings et al. 2023: 1): how does language portray a specific social group? Which linguistic decisions align with certain ideological stances? Do discursive portrayals evolve over time? What significance do specific linguistic selections hold in institutional discourses? The common thread among all CADS studies is their focus on a specific social issue (e.g., inequality, poverty, racism, gender-related prejudice, violence or discrimination, etc.), transcending mere linguistic concerns. Alternatively, research may arise from a broader curiosity about the connections between a given social practice and its associated linguistic preferences. In brief, CADS effectively aid in dissecting the intricacies of discourse, making it appealing not only to linguists but also to researchers exploring the interplay between discourse and society across disciplines such as sociology, psychology, law, management, and beyond. Actually, a characteristic feature of CADS is their frequent reliance on methodological triangulation, i.e., resorting to an eclectic mix of techniques and methodological approaches to analyse a given social phenomenon. In this regard, Baker and Egbert (2016: 3) argue that

methodological triangulation facilitates validity checks of hypotheses, anchors findings in more robust interpretations and explanations, and allows the researcher to respond flexibility to unforeseen problems and aspects of the research. Such triangulation can involve using multiple methods, analysts, datasets, and it has been used for decades by social scientists as a means of explaining behavior by studying it from two or more perspectives (Webb et al. 1996; Glaser and Strauss 1967; Newby 1977; Layder 1993; Cohen and Manion 2000).

In addition to serving as an effective methodology to examine social phenomena according to quantifiable linguistic evidence, CADS can provide a crucial contribution to the study of textuality / text production norms in both specialized and non-specialized contexts. In fact, the synergy between CL and discourse studies at large allows researchers to analyse the data through a functionalist lens and delve, by way of example, into multiple aspects of the text-building process (Douglas et al. 1998; Gries 2009a, 2009b; Kabanoff 1997; Lindquist, 2009; O’Keeffe and McCarthy 2022; Popping 2000) such as:

1. Pattern recognition: CADS enable the identification of recurring linguistic patterns within a corpus of texts, unveiling how texts are structured and organized.
2. Genre analysis: By studying various genres within a corpus, CADS reveal how text organization varies across different communicative contexts and genres.
3. Cohesion and coherence: Through CADS, researchers can examine lexico-grammatical devices that contribute to the coherence and cohesion of texts, enhancing our understanding of their organization.
4. Functional analysis: CADS facilitate the examination of linguistic choices that fulfill specific textual functions, elucidating how texts are organised to convey meaning effectively.
5. Thematic exploration: Researchers can employ CADS to uncover prevalent themes and topics across texts, shedding light on how texts are structured around particular key concepts.
6. Discourse markers and structure: CADS allow for the analysis of discourse markers and their role in signaling transitions and overall discourse structure, both in specialised and in general language.
7. Comparative study: Through CADS, text organization and lexico-grammatical features can be compared across different contexts, time periods, or genres, revealing variations and trends.

A mere glance at the index of this volume is enough to realize the multiplicity of research areas falling under the scope of CADS. Some chapters in the first part of the book appear more directly related to the first part of the acronym, Corpus Linguistics (CL), e.g., the preliminary reflections on the linguist’s complex role in Sentiment Analysis or in performing grammatical tagging, an operation that adds value to corpora and makes them suitable for multiple research purposes. Both chapters fall within the gambit of functional text analysis, which correlates recurring structures with specific discourse functions. Conversely, the chapters dealing with the conventional traits of specialized phraseology (e.g., the lexical bundles employed in tourism communication in hotel webpages in German) or with the analysis of particular constructions at the interface of syntax and phraseology (e.g., the study of the WAY-construction in English and Spanish, of ‘why-fragments’ in contemporary English, or of ‘if clauses’ in American English) could be categorised under the broader umbrella of ‘recurrent pattern recognition’ or ‘genre analysis’. On the other hand, the majority of studies in the second part of the volume are oriented towards the second component of the CADS acronym ([Critical] Discourse Studies) and examine the social practices deployed in specific discourses, e.g., electoral debates, TV series, social media like Twitter, or medical communication.

Last but not least, it is worth mentioning the potential limitations of CADS, thoroughly reviewed by Taylor and Marchi (2018). It must be emphasized that CADS are effective when this methodology is aligned with both the available data and the research questions. Its applicability varies and it may deemed essential for some projects, useful but non-essential or not worth it for others. Indeed, methodological competence involves knowing when and how to bring CADS to bear or whether to use it at all. While CADS’ potential is now beyond question, their primary shortcomings can be outlined as follows (Gillings et al. 2023: 45–46):

1. CADS are unavoidably biased towards lexis, requiring identifiable words for analysis. This methodology is less suitable when the research question pertain to general discourse phenomena lacking clear ‘lexical hooks’ and unfolding over lengthy stretches of discourse, like argumentative strategies or extended metaphors.
2. They focus on words, limiting understanding of meaning in longer texts or dynamic social interactions.
3. CADS do not readily lend themselves to identifying absent phenomena / language patterns, as they primarily look for present elements. Comparisons between corpora can be helpful (Duguid and Partington 2018: 56), but interpretation remains the researcher’s responsibility.
4. CADS lean toward textual data, omitting non-verbal context like images, sounds or gestures. In this regard, advances in multimodal analysis are emerging, for example, Bednarek and Caple (2014: 151) proposed a new method called Corpus-Assisted Multimodal Discourse Analysis (CAMDA), combining corpus-based analysis with detailed scrutiny of other semiotic resources (also see Bednarek and Caple, 2017; Caple, 2018; Caple et al. 2020).
5. The metadata usually available in large corpora may lack contextual factors, hindering accurate analysis of age, gender, ethnicity, etc. The availability of a wider range of metadata in corpora annotation may result in broader research possibilities.
6. CADS cannot eliminate bias entirely, as key research design decisions remain subjective, e.g., decisions involving how to construct a corpus, which annotation scheme and software is considered more suitable for the analysis, the order of analytical steps, the frequency or statistical saliency thresholds, etc. (see Baker 2015: 286).

In the light of the above, one might even question whether the term ‘limitations’ fully applies. In fact, CADS should be assessed based on its intended purpose: if a method seems limiting, it might not be flawed but mismatched with research goals. Aligning the method with the research questions and to the data available undoubtedly enhances its potential.

2. Volume structure

The volume is divided into two parts. While the first part, CADS as a methodological approach, focuses on the corpus linguistic methodological approach to discourse analysis, the second part offers case studies.

The first part begins with Antonio Moreno-Ortiz’s chapter ‘The linguist’s role in sentiment analysis: from knowledge provider to data annotator’. Here the author gives an overview of linguistic studies, corpus linguistics and computational linguistics used by researchers to present practical use cases, methods, systems, problems, advantages and disadvantages of different approaches, including sentiment analysis, lexicon-based and machine learning-based approaches, and transformer-based models, which represent the state of the art in natural language processing.

Details

Pages: 372
Year: 2023
ISBN (PDF): 9783034348003
ISBN (ePUB): 9783034348010
ISBN (Hardcover): 9783034347532
DOI: 10.3726/b21393
Language: English
Publication date: 2024 (April)
Keywords: Corpus-Assisted Discourse Analysis Corpus Linguistics Critical Discourse Analysis Sentiment Analysis Discourse studies Pragmatics Media Discourse Social Media
Published: Lausanne, Berlin, Bruxelles, Chennai, New York, Oxford, 2023. 372 pp., 25 fig. b/w, 56 tables.

Biographical notes

Stefania Maci (Volume editor) Giovanni Garofalo (Volume editor)

Stefania Maci is Full Professor of English Linguistics and Translation at the University of Bergamo, where she is the coordinator of the MA in Digital Humanities and Director of the Research Centre on Specialised Language. Her research is focussed on the study of the English language in academic and professional contexts. Giovanni Garofalo is Full Professor of Spanish Language and Translation at the University of Bergamo (Italy). Member of several research groups in Italy and abroad, he has participated in internationally significant research projects. His main interests encompass corpus-assisted discourse analysis, with an emphasis on specialized discourse and the discursive construction of gender.

Investigating Discourse and Texts

Summary

Excerpt

Table Of Contents

Investigating discourse and texts through Corpus-Assisted Discourse Studies (CADS)

1. Corpus-Assisted Discourse Studies: A methodological synergy to analyse discourse and text-building strategies

2. Volume structure

Details

Biographical notes

Key Subject Areas