Corpus analysis is driven by a common interest in ‘linguistic evidence’, viewed as a source of insights into language phenomena or of lexical, semantic and contrastive data for subsequent applications. Among the latter, pedagogical settings are highly prominent, as corpora can be used to monitor classroom output, raise learner awareness and inform teaching materials.
The eighteen chapters in this volume focus on contexts where English is employed by specialists in the professions or academia and debate some of the challenges arising from the complex relationship between linguistic theory, data-mining tools and statistical methods.
Introduction: Maurizio Gotti / Davide S. Giannoni 9
MAURIZIO GOTTI / DAVIDE S. GIANNONI Introduction 1. Corpus analysis and specialised discourse The study of language use through documentary evidence gleaned from variously large collections of authentic texts pre-dates by centuries the modern science of corpus linguistics. Since Samuel Johnson’s landmark Dictionary of the English Language (1755), lexi- cographers and the reading public have become aware that in language matters intuition is not enough, for the actual meaning/usage of words varies over time, from place to place and contextually. Driven by a similar interest, medieval scholars pioneered the first Bible Concor- dances (Schenker 2003), documenting the frequency and semantic range of root words in Scripture. Similar concordances were compiled after the advent of print from the works of literary classics such as Chaucer, Shakespeare and Milton, to name but a few. Despite these early examples, the realisation that language description should always be corroborated by textual evidence is a relatively new development in linguistic research, with the main thrust coming from computational linguistic techniques in the 1950s, and the subsequent appearance of electronic computing machines (cf. Sinclair et al. 1970). The spread of personal computers in the late 1980s, com- bined with the inception of online media in the 1990s, has revolution- ised the field in two major directions: x widespread accessibility of huge amounts of data with no agreed guidelines for its collection, storage or processing; x a dramatic shift from manual analysis to automatic data mining, based on dedicated software applications and increasingly com- plex statistical tools....