Exploring discourse and ideology through corpora

by Miguel Fuster Márquez (Volume editor) José Santaemilia (Volume editor) Carmen Gregori-Signes (Volume editor) Paula Rodríguez-Abruñeiras (Volume editor)
Edited Collection 292 Pages
Series: Linguistic Insights, Volume 276

Table Of Content

  • Cover
  • Title
  • Copyright
  • About the author
  • About the book
  • This eBook can be cited
  • Contents
  • Insights from corpus-assisted discourse analysis: Unveiling social attitudes and values: Miguel Fuster-Márquez, José Santaemilia, Carmen Gregori-Signes and Paula Rodríguez-Abruñeiras
  • Post-history, post-democracy, post-truth, post-Trump? Really? A corpus-assisted study of delegitimisation via argument strategies: ‘dirty tricks’, evaluation and hyperbole in modern political discourses: Alan Partington
  • Analysing the impacts of 19th-century drought: A corpus-based study: Tony McEnery, Helen Baker and Carmen Dayrell
  • Coverage of the far-right in the Spanish written press: The case of Vox: Salvador Enguix-Oliver and Beatriz Gallardo-Paúls
  • Evaluation in Theresa May’s political discourse: A study of the PM’s seminal Brexit speeches: Ana Belén Cabrejas-Peñuelas and Rosana Dolón
  • ‘Nobody is guilty in football. That’s the first thing to understand’: A corpus-assisted critical discourse analysis of the UK press coverage of the Ched Evans case: Leanne Victoria Bartley
  • The role of news values in the discursive construction of the female victim in media outlets: A comparative study: Sergio Maruenda-Bataller
  • How does violence-motivated online discourse differ from its non-violent counterpart? Insights from a CADS approach: Alfonso Sánchez-Moya
  • ‘We’ll watch TV and do other stuff’: A corpus-assisted discourse study of vague language use in online child sexual grooming: Nuria Lorenzo-Dus and Anina Kinzel
  • The narrative of the anti-vax campaign on Twitter: Stefania M. Maci
  • Debating Saudi womanhood: A corpus-aided critical discourse analysis of the representation of Saudi women in the Twitter campaign against the ‘Male Guardianship’ system: Nouf Alotaibi and Jane Mulderrig
  • Notes on Contributors
  • Series index

←6 | 7→

Miguel Fuster-Márquez, José Santaemilia, Carmen Gregori-Signes and Paula Rodríguez-Abruñeiras

Insights from corpus-assisted discourse analysis: Unveiling social attitudes and values

The exploration of discourse through corpus linguistics techniques is a rapidly expanding field. Indeed, Corpus-Assisted Discourse Studies (CADS), as introduced and defined by Alan Partington, has become a very popular methodology or approach for the critical and also non critical analysis of discourses. Publications have proliferated over the last decade, thus problematising their objects of study and increasingly widening their scope and focus. These trends have encompassed a wide range of interests and the proposal of applying specific corpus techniques, or combining different methodologies, or drawing on information from various relevant sources, as an appealing strategy. There is already a well-established tradition of disciplines that have benefitted from the potential application of corpus techniques, including applied linguistics, variationist studies, specialised languages, historical linguistics, pragmatics, forensic linguistics, learner language, translation and stylistic studies, among others. Additionally, young scholars who wish to engage themselves in corpus-related research, today have at their disposal handbooks and companions published by the most prestigious publishing houses, and also a large number of public corpora, web sources and software that will pave their way for their research. While some publications have dealt with corpus linguistics overviews others have, more specifically, focused on statistical methods. This is a helpful reminder that quantification is a crucial and distinctive aspect of most kinds of corpus research, perhaps inextricably bound to it. In contrast with earlier linguistic analysis which focused on single texts, or dealt with small amounts of data, dealing empirically with large amounts of text matter in corpora, not infrequently millions of words, and hundreds or thousands of texts, brings with it the need of solid and ←7 | 8→reliable quantification methods. The fact of exploring large amounts of data has meant a change of research priorities, and older alternative purely qualitative methods are simply not suitable. This also means that corpus linguistics, being more thoroughly empirical, has been moving away from the humanities to embrace methods, approaches which are regularly found in the social sciences.

It is increasingly evident that corpus methodologies have already greatly contributed to discourse analysis and shown their potential to unveil political or ideological attitudes, values, or social inequalities. The implementation of corpus techniques, which are increasingly refined every day, allow us access to document social attitudes, to expose injustice or sexism and –hopefully – to fight against discrimination by obtaining results which are more generalisable. Until quite recently, the study of social issues, identity politics or ideological values has favoured qualitative analyses of small collections of texts. Today, however, more and more researchers are convinced that the use of corpus linguistics for social analysis constitutes a very powerful instrument of social inquiry, especially if used in combination with other (qualitative) approaches which require close reading and the interpretation of particularly salient textual features. Corpus-Assisted Discourse Studies (CADS) enjoys today great popularity and academic appeal, partly because of its ability to move back and forth, in a complementary way, between qualitative and quantitative techniques in order to generate new hypotheses and to test existing ones.

Contributions to this volume deal with socio-ideological issues such as political and media discourses, gender-based studies or hate language, which have increasingly become the object of corpus linguistics research, as have their evaluative, emotive or attitudinal dimensions. Emerging trends in research have come from the application of CADS to the interactive discourses contained in social networks. Some contributions in this volume also bear witness to this trend. The contributions to this volume share a committed view on social reality. To meet their goals, on the one hand, they use corpus tools to guarantee a balanced quantitative and qualitative analysis of (large or small) corpora and, on the other, provide us with a critical approach to the ideological nuances present in texts. This book offers a wide range of analyses and insights into burning and/or conflicting social issues. Alan ←8 | 9→Partington’s chapter, entitled “Post-history, post-democracy, post-truth, post-Trump? Really? A corpus-assisted study of delegitimisation via argument strategies: ‘dirty tricks’, evaluation and hyperbole in modern political discourses,” illustrates the balanced use of qualitative and quantitative approaches in the examination of a series of contemporary post-denominations (post-history, post-democracy, post-truth, post-Trump) in order to unveil the ideological delegitimisation of opposing groups or individuals. In this chapter, Partington quite insightfully addresses a number of crucial problems which have to do with the difficult synergy of Corpus Linguistics and more classical qualitative Discourse Studies. For this particular piece of research, Partington has made use of different corpora. In his view, researching ‘macro’ argument strategies (i.e. de/legitimisation) is a way of addressing one of the ‘dusty corners’ (Taylor/Marchi 2018) in corpus linguistics, which tends to favour a (sometimes decontextualised) scrutiny of large bodies of texts by focusing on ‘micro’ strategies. For Partington, macro level features should be “ ‘assisted’ by the researcher’s intuitions” and world knowledge.

Tony McEnery, Helen Baker and Carmen Dayrell are in charge of the second chapter, “Analysing the impacts of 19th-century drought: A corpus-based study”, where they turn their attention to the past in order to access information gaps through textual data about the weather that could be useful to historians and meteorologists, thus proving the usefulness and convenience of carrying out textual analyses of large amounts of historical media texts (in this case, a large sample of 19th century British newspapers reporting on climate conditions in different parts of the country) by means of corpus methodologies. The rigorous examination of discourses can become an indispensible contribution of corpus linguistics to non-linguistic fields such as the one dealt with here in relation to weather conditions in earlier centuries which cannot be accounted for directly through sophisticated modern standard methods. Nevertheless, the authors deem most appropriate to make use of triangulation, hence they use corpus methods in concert with other techniques (concordance geo-parsing and close reading analyses) with a view to reconstructing, as faithfully as possible, media narratives of the droughts and water shortages in different regions of the UK.

←9 |

Political Discourse, one of the favourite targets of corpus-based analysis, is illustrated in this volume by Salvador Enguix-Oliver and Beatriz Gallardo-Paúls’s study of recent political developments related to the extremist far-right Spanish party Vox during 2018 in their contribution “Coverage of the far-right in the Spanish written press: The case of Vox”. With this aim in mind, the authors have selected a number of widely read Spanish quality written newspapers (El País, El Mundo, La Vanguardia and ABC) and digital dailies (El Español, eldiario.es, Público and Infolibre). According to the authors, these newspaper items belong to their own sub-corpus PRODISNET-201, which is part of the larger PRODISNET (Discursive Processes on the Internet corpus). The hypothesis explored in this chapter is that far-right parties like the Spanish party Vox have been, and are still, receiving largely undeserved media coverage in Western democracies, and this media attitude enhances their public visibility which results in unprecedented electoral success. The authors also resort to the use of quantitative methods such as sentiment analysis (using the software Lingmotif, which reveals a basically negative appraisal) and complementing their study with close pragmatic analyses of all texts, since the authors find, for example, that political evaluation in the media is often expressed implicitly. In their conclusion they highlight that negative political evaluations are most often carried out through presuppositions and anomalous implicatures of manner and quality.

In “Evaluation in Theresa May’s political discourse: A study of the PM’s seminal Brexit speeches”, Ana Belén Cabrejas-Peñuelas and Rosana Dolón focus on a burning issue such as the Brexit process, as exemplified in three seminal speeches by the former British PM Theresa May. The speeches were delivered during key moments of Brexit negotations: the Lancaster House speech (January 2017), the Florence speech (September 2017) and the London Mansion House speech (March 2018). The authors home in on the verbal content of this corpus to show how evaluation of status (see Hunston 2000, 2008, 2011) is expressed in political texts, and reveals how corpus linguistics can meaningfully contribute to the study of evaluation and persuasion. The typological framework applied here has the earlier work by Díez-Prados & Cabrejas-Peñuelas (2018) as an acknowledged antecedent. Use has been made in this contribution of the freeware program called UAM Corpus Tool ←10 | 11→developed by Mick O’Donnell (2019). The corpus has been manually annotated to automatically carry out the inferential statistical analysis. According to the authors, a key persuasive strategy in Theresa May’s speeches would be the use of rhetorical devices of “suggestion” against “proof”, and of maintaining an apparently objective stance (Hunston 2011: 27). The linguistic choices made by the former British PM seem to have the projection of a down-to-earth and objective position as a specific persuasive intention.

Gender issues –and, specifically, violence against women (VAW) – feature prominently in the growing research that makes combined use of corpus linguistics and other methodologies, as shown in Leanne Victoria Bartley’s chapter “ ‘Nobody is guilty in football. That’s the first thing to understand’: A corpus-assisted critical discourse analysis of the UK press coverage of the Ched Evans case.” Bartley focuses on judicial events which took place in 2012 in the UK, when the famous footballer Ched Evans was convicted of raping a young woman in a hotel room in Rhyl, North Wales, after a night out together. In 2016, Evans’ case was reviewed, his conviction overturned, and he also received a compensation of £800,000. Bartley explores the approaches of the British press (The Daily Mail, The Guardian, The Mirror, The Sun) during this period by means of a CADS approach, based on Martin and White’s (2005) appraisal theory slightly modified by Bednarek’s (2008) recommendations concerning the subcategory of affect. She makes use of UAM Corpus Tool (O’Donnell 2019), which allows users to employ this in-built paradigm for quantitative and qualitative analysis. Bartley investigates how both the perpetrator and his victim were represented in the British press at different stages of this controversial case. She finds that the media discourses changed over time – the view of the alleged perpetrator went from harsh criticism to a positive image. The focus on the victim, by contrast, went from understanding her inability to consent to sexual intercourse to criticising her inappropriate sexual activities.

In the following chapter, “News values in construing female victims of VAW discourses in the media: A view from CADS,” Sergio Maruenda-Bataller also addresses a gender issue. The aim of this author is to explore how mainstream Spanish and British quality newspapers, during the decade from 2005 to 2015, have construed female victims of ←11 | 12→VAW discursively. Classifying the news values that are used to depict female victims, he examines and compares the treatment that Spanish and British newspapers give to female victims in VAW news reports. For this purpose, the author makes use of a large purpose-built corpus where he applies the Discursive News Values Analysis paradigm as described by Bednarek and Caple (2017), but also relies on Potts et al.’s (2015) study for the use of corpus methods, where the authors also made use of a large corpus to investigate the representation of the Katrina Hurricane in the news. While more research is needed, it is the author’s view that results appear to substantiate the uneven presence of two complementary discourses which are nevertheless inextricably linked: a discourse of death, violence and suffering (thus, predominance of news values such as negativity and impact) and another of institutional and social support, (which realises the news value of consonance).

Alfonso Sánchez-Moya also explores the gender issue of VAW in the chapter “How does violence-motivated online discourse differ from its non-violent counterpart? Insights from a CADS approach.” He makes use LIWC, a sentiment software tool to analyse the discourse of female survivors of Intimate Partner Violence (IPV) in a corpus of online forum messages by women who are undergoing such violence, containing a total of 120,000 words. Insights into this phenomenon with a discursive approach are, in the author’s words, scarce. Some of his conclusions are that, when compared to another neutral/non-violent corpus containing forum messages by women, the discourse in this IPV-related forum appears to be characterized by a narrative-oriented and personal style, focusing more on the here-and-now. Particularly, the sub-corpus dealing with abuse within the IPV corpus shows a severely pessimistic type of discourse. Also, the claims about IPV-related negative emotions and their discursive characterization are validated by this analysis.

In “ ‘We’ll watch TV and do other stuff’: A corpus-assisted discourse study of vague language use in online child sexual grooming,” Nuria Lorenzo-Dus and Anina Kinzel show the practical application of a CADS methodology for very serious sexuality-related issues – i.e. a recent escalation of cases of online child sexual grooming (OCSG) as “an Internet-enabled communicative process of entrapment” in the ←12 | 13→UK. The authors’ aim is to identify the linguistic and rhetorical tactics used by adults convicted of OCSG in order to provide collaboration to law enforcement agencies and educational institutions. The large corpus containing chat logs were compiled between 2004 and 2016. The corpus was uploaded to CQP Web, and VARD has been used by the authors to standardise the spelling. Among their conclusions they highlight the strategic use of implicitness and vague language for manipulative goals, and their results show that the main category used by groomers is that of quality-approximators, followed by vague identifiers and de-identifiers. Briefly, groomers manipulate communications with children as targets in ways which confuse them greatly.

Stefania M. Maci presents the chapter “The narrative of the anti-vax campaign on Twitter”, on medical knowledge as disseminated by digital media. Maci describes the discourse strategies employed by US anti-vaccine activists in a small Twitter corpus of 16,768 tweets (75,960 running words) gathered in October 2018 after some children died from measles in New York City, their aim being spreading fake news about vaccination. The corpus has been semantically annotated by USAS to identify key semantic domains and WMatrix has been used for the automatic processing of the most relevant meanings. Maci also carried out a sentiment analysis, employing Socialbearing.com as its software. The results obtained reveal an overall strong, articulate anti-vaccine discourse displayed by these activists of the anti-vaccine movement in their tweets. Maci claims that, although apparently grounded on scientific truths, these activists favour a conspiracy theory between the government and the pharmacological industry.

And last but not least, Nouf Alotaibi and Jane Mulderrig are the authors of the chapter that closes this volume, “Debating Saudi womanhood: A corpus-aided critical discourse analysis of the representation of Saudi women in the Twitter campaign against the ‘male guardianship’ system.” Again, gender politics features prominently here, as the chapter focuses on a highly controversial topic (‘male guardianship’) affecting the freedoms, and the whole lives, of millions of women in Saudi Arabia. The authors wish to analyse the Saudi women’s attitudes towards the online campaign #سعوديات_نطالب_اسقاط_الولاية (i.e. ‘#EndMaleGuardianshipSystem’ – #EMGC) that started in 2016, and is generating a polarization of Saudi women’s positions into two ←13 | 14→groups – Pro-#EMGC and Anti-#EMGC. The paper seeks to answer three questions: “1) who were the salient social actors in the discourse of #EMGC?; 2) how did the female campaigners represent Saudi women as social actors?; and 3) what were their most prominent actions?” To that end they use a corpus-aided critical discourse approach, bringing together the potential of corpus tools (word list and concordance tools provided by the concordance software ‘AntConc’ – see Anthony 2017) to reveal key textual patterns and the insights drawn from CDA and the representation of social actors (van Leeuwen 2008). This chapter shows, as do the others in this volume, that a synergy between corpus and discourse analytical tools is crucial in identifying, analysing and even advancing social issues – i.e. in this case, the women’s rights issue in Saudi Arabia.

Readers should notice that authors in this volume have relied on data contained in either small or large corpora. It is convenient to bear in mind that, in the same way as there is no specifically recommended methodology attached to the analysis of (critical) discourses, there is no agreed criterion either about the size of corpora suitable in research. Perhaps ‘the bigger the better’ is a good guiding principle to be seen in many published papers, since large corpora are more likely to yield generalisable findings and insights; however, small targeted corpora, as shown here, are perfectly suitable for many kinds of corpus research into discourse.

Also, although not a few analysts believe that the use of corpus linguistics tools is a thoroughly automatic process, where the “machine”, not the analyst, decides the results obtained through quantification, this in fact constitutes a very poor understanding of the way corpus analysis is actually applied. Researchers need to use their intuition and knowledge to decide many aspects throughout their corpus research, not the computer software. For example, as shown in this volume, not a few authors have chosen specific annotation tools and, at times, annotation is also carried out manually. Other authors decide not to annotate their corpus. In any case, no two annotation systems are equal. Therefore, a variety of annotation tools can be used, in the same way as the “kind” and “size” of corpus is decided by the researchers according to their particular objectives. As to the corpus software, the contributions in this volume also show a variety of software tools, platforms, and the answer as to which ones to apply lie entirely in the researcher. As to ←14 | 15→quantification, different statistics are also displayed. All of these are decisions or choices which have to be made by researchers, and come to show the resourcefulness of corpus linguistics and why it is becoming more and more popular in discourse analysis research.

Finally, corpus linguistics has been enriched by the incorporation of larger social, historial, ideological and ethical considerations, brought about by critical approaches to discourse. Certainly, this volume is not the first one to be published around these issues. However, it attempts to provide readers with updated research from leading international scholars in Discourse Analysis. All CADS researchers found in this volume are quite aware of the strength which come from applying a methodological synergy that combines different kinds of quantitative and/or qualitative research. A critical attitude towards the choice of methods and the author’s stance undoubtedly contribute to more fine-tuned analyses and to a more responsible and reflexive practice.

We believe this edited volume is an excellent example of this growing symbiosis between methods, types of corpora, disciplines, software programmes, discourse perspectives, and social and ethical concerns.


This book explores discourse mainly through corpus linguistics methods. Indeed, Corpus-Assisted Discourse Studies has become a widely used approach for the critical (or non-critical) analysis of discourses in recent times. The book focuses on the analysis of different kinds of discourse, but most particularly on those which attempt to unveil social attitudes and values. Although a corpus methodology is deemed crucial in all research found here, it should not be inferred that a single, uniform technique is applied, but a wide variety of them, often shaped by the software which has been used. Also, more than one (qualitative or quantitative) methodology or drawing from various relevant sources is often called for in the critical analysis of discourses.


ISBN (Hardcover)
Publication date
2021 (March)
Bern, Berlin, Bruxelles, New York, Oxford, Warszawa, Wien, 2021. 292 pp., 25 fig. col., 15 fig. b/w, 31 tables.

Biographical notes

Miguel Fuster Márquez (Volume editor) José Santaemilia (Volume editor) Carmen Gregori-Signes (Volume editor) Paula Rodríguez-Abruñeiras (Volume editor)

Miguel Fuster-Márquez is an associate professor of English at the Universitat de València. His teaching includes English lexicology, English sociolinguistics and corpus linguistics both at undergraduate and graduate levels. His fields of interest are related to the application of corpus techniques to different disciplines. José Santaemilia is a professor of English at the Universitat de València, where he teaches legal translation (English-Spanish/Catalan) as well as professional deontology and ethics and introduction to translation research. His areas of interest include gender and language studies, gender/sexuality and translation, and legal translation. Carmen Gregori-Signes is an associate professor of English at the Universitat de València. She teaches corpus linguistics, (critical) discourse analysis, and grammar both at undergraduate and graduate levels. Her research interests include corpus-assisted (critical) discourse analysis, and the representation of gender in telecinematic and media discourse. Paula Rodríguez-Abruñeiras is an associate professor of English at the Universitat de València. Both her teaching and her research have always revolved around the history of English. She is also interested in corpus linguistics, the new varieties of English, and gender studies.


Title: Exploring discourse and ideology through corpora