Show Less
Restricted access

Specialisation and Variation in Language Corpora


Edited By Ana Diaz-Negrillo and Francisco Javier Diaz-Pérez

Corpus linguistics was initiated with the compilation and exploitation of native English reference corpora. Over the past years, corpus linguistics has experienced such a great expansion and specialisation that a variety of languages, registers, text types and speakers are now represented in language corpora. This volume intends to give evidence of the extraordinary expansion that corpus linguistics and language corpora have undergone. It focuses on emerging types of corpora and corpus techniques, and also presents corpus-based studies in areas which have benefited from the recent developments in corpus linguistics methods and techniques, including foreign language teaching, language acquisition, translation and terminology dialectology, lexicography and language variation. The volume comprises 11 papers on technical aspects of corpus data processing, on corpus-based linguistic research, and on emerging corpora. It is structured in three main sections, one for each of the three latter aspects.
Show Summary Details
Restricted access

AixOx, a multi-layered learners’ corpus: automatic annotation: Sophie Herment, Anne Tortel, Brigitte Bigi, Daniel Hirst, Anastassia Loukina



AixOx, a multi-layered learners’ corpus: automatic annotation


This paper presents a multilingual learners corpus, AixOx, collected in the framework of an Alliance project (a partnership between the British Council and The French Ministry of Foreign Affairs). The corpus consists of the recording of 40 1-minute passages in English and French from the Eurom 1 corpus (Chan et al. 1995), read by native speakers and L2 learners. French native speakers reading the French and English passages were recorded in Aix-en-Provence, and English native speakers reading the English and French passages were recorded in Oxford. The AixOx corpus contains about 40 hours of read speech and can be downloaded from the Speech and Language Data Repository (). This paper also presents the tools used for automatic annotation on several layers using the following algorithms: SPPAS – SPeech Phonetization Alignment and Syllabification – (Bigi, 2012) for a segmentation into utterances, words, syllables and phonemes; and MoMel – Modelling Melody – and INTSINT – INternational Transcription System for INTonation – (Hirst 2007) for the modelling and coding of intonation. Finally, an example of a pedagogical application of the corpus is given: a pilot-study on the intonation of questions. We show how the AixOx corpus can be used to compare the productions of natives with learners and how it is possible, thanks to the annotation, to understand the prosodic realisations (whether they be positive or negative) and...

You are not authenticated to view the full text of this chapter or article.

This site requires a subscription or purchase to access the full text of books or journals.

Do you have any questions? Contact us.

Or login to access all content.