Approaching Language Variation through Corpora

A Festschrift in Honour of Toshio Saito


Shunji Yamazaki and Robert Sigley

This book is a collection of papers using samples of real language data (corpora) to explore variation in the use of English. This collection celebrates the achievements of Toshio Saito, a pioneer in corpus linguistics within Japan and founder of the Japan Association for English Corpus Studies (JAECS).
The main aims throughout the collection are to present practical solutions for methodological and interpretational problems common in such research, and to make the research methods and issues as accessible as possible, to educate and inspire future researchers. Together, the papers represent many different dimensions of variation, including: differences in (frequency of) use under different linguistic conditions; differences between styles or registers of use; change over time; differences between regional varieties; differences between social groups; and differences in use by one individual on different occasions. The papers are grouped into four sections: studies considering methodological problems in the use of real language samples; studies describing features of language usage in different linguistic environments in modern English; studies following change over time; and case studies illustrating variation in usage for different purposes, or by different groups or individuals, in society.


ROBERT SIGLEY Assessing Corpus Comparability Using a Formality Index: The Case of the Brown/LOB Clones - 65


ROBERT SIGLEY Assessing Corpus Comparability Using a Formality Index: The Case of the Brown/LOB Clones 1. Introduction1 For a variety of reasons, corpus linguists often want to make compar- isons between different corpora. For example, we may want to check that our results are reproducible in other samples; to compare different varieties of a language; or to look at how language changes across time by comparing samples from different time periods. The Brown/LOB clones listed in Table 1 are a set of corpora which were designed specifically to let researchers make such com- parisons. Each is a 1-million-word sample of edited written English, consisting of 500 2000-word text samples, based on the sampling frame used for the Brown corpus of American English (Francis 1964) and the LOB corpus of British English (Johansson 1978). Both Brown and LOB represent texts published in 1961. Christian Mair’s team at Freiburg, Germany, constructed the parallel Frown and FLOB corpora using American and British texts published between 1990 and 1992 (Hundt/ Sand/Siemund 1998; Hundt/Sand/Skandera 1998). The Wellington Corpus of Written New Zealand English (Bauer 1993; henceforth WWC) is a parallel corpus of New Zealand writings from 1986-90. Thus we can compare American, British and New Zealand writings using Frown, FLOB, and WWC; and we can study change between 1961 and 1991 by comparing Brown with Frown, or LOB with FLOB (see for 1 The author would like to thank Bernadette Vine, Mark Chadwick, Laurie Bauer, and Andrea Sand for assistance in accessing the various corpora....

