Loading...

Studies in Learner Corpus Linguistics

Research and Applications for Foreign Language Teaching and Assessment

by Erik Castello (Volume editor) Katherine Ackerley (Volume editor) Francesca Coccetta (Volume editor)
©2016 Edited Collection 358 Pages
Series: Linguistic Insights, Volume 190

Summary

This volume explores the potential of using both cross-sectional and longitudinal learner corpora to investigate the interlanguage of learners with various L1 backgrounds and to subsequently apply the findings to language teaching and assessment. It is made up of 18 chapters selected from papers presented at the international conference «Compiling and Using Learner Corpora», held in May 2013 at the University of Padua, Italy. The chapters discuss current issues and future developments of the use of learner corpora, present case studies based on teaching and assessment experiences in various contexts, and longitudinal corpus-based studies conducted within the Longitudinal Database of Learner English (LONGDALE) project. Other chapters report on investigations of specific aspects of the interlanguage of a variety of learner populations, and the last ones address issues of corpus compilation and representativeness. The majority of the contributions draw on data produced by EFL learners from Germany, Italy, Japan, Spain, and the Netherlands, while others concern learners of Italian and Spanish as Foreign Languages.

Table Of Contents

  • Cover
  • Title
  • Copyright
  • About the author
  • About the book
  • This eBook can be cited
  • Contents
  • Introduction
  • Section 1: Learner Corpora for Language Teaching and Assessment
  • Using Learner Corpora in Language Testing and Assessment: Current Practice and Future Challenges
  • Dealing with Errors in Learner Corpora to Describe, Teach and Assess EFL Writing: Focus on Article Use
  • Using Learner Corpora to Order Linguistic Structures in Terms of Apparent Difficulty
  • Focus on Form in Computer-Mediated Communication: Using Written Learner Data to Foster Language and Pragmatic Skills in Communicative Contexts
  • The Compilation and Use of a CMC Learner Corpus for Japanese University Students
  • Section 2: Longitudinal Learner Corpus-based Studies
  • Introduction to the LONGDALE Project
  • Nouns and Noun Phrases in Advanced Dutch EFL Writing: From Quantitative to Qualitative Longitudinal Data Analysis
  • I didn’t really *understood what it was about, but it really *made fun: A Longitudinal Corpus-based Study of Tense/Aspect and High-frequency Verbs in Learner English
  • Assessing Advanced EFL Students’ Proficiency at Producing Affect-laden Discourse
  • Towards a Longitudinal Study of Metadiscourse in EFL Academic Writing: Focus on Italian Learners’ Use of it-extraposition
  • Short-term Effects of Students’ Exploration of Corpora: A Longitudinal Study of Pre- and Post-modification of Noun Phrases in Learner English
  • Section 3: Language Corpora for the Analysis of Interlanguage
  • Analysing the Language of Interpersonal Relations in Corpora of Elicited Learner and Native Speaker Interactions in English
  • From Learner to Expert: Using a Corpus to Analyse the Use of Must by German Advanced Students of English
  • Spanish Copulas and the Interlanguage of Iraqi University Students
  • Phraseology in Academic L2 Discourse: The Use of Multi-words Units in a CMC University Context
  • Section 4: Learner Corpus Compilation and Representativeness
  • Representing Learner English in a Specialized Corpus: Genre and Proficiency Level in the Advanced Learner Corpus of Argumentative Student Essays (ALCASE)
  • Connecting Data Elicitation and Pedagogical Practice in Learner Corpus Design: The Case of TILCE – the Turin Italian Learner Corpus of English
  • A Generic Data Workflow for Building Annotated Text Corpora
  • Notes on Contributors

← 18 | 19 →

Section 1:
Learner Corpora for Language Teaching and Assessment

← 19 | 20 →

← 20 | 21 →

MARCUS CALLIES

Using Learner Corpora in Language Testing and Assessment: Current Practice and Future Challenges1

1.   Introduction

Learner Corpus Research (LCR) has contributed significantly to the description of advanced interlanguages, and many of its findings have resulted in useful applications for second/foreign (L2) language teaching and learning. Learner- and native-speaker corpora are also receiving increasing attention in the field of Language Testing and Assessment (LTA) to approach the construct of L2 proficiency. LTA as a subfield within Applied Linguistics subsumes a vast area of different assessment and testing contexts, in which the terms ‘testing’ and ‘assessment’ are often used interchangeably. The two terms both refer to “the systematic gathering of language-related behavior in order to make inferences about language ability and capacity for language use on other occasions” (Chapelle/Plakans 2013: 241). However, the term ‘assessment’ is often used to refer to a more varied process of data gathering and interpretation than ‘testing’, which applies to assessment practices in institutional contexts (Chapelle/Plakans 2013: 241). Focusing on productive skills, mainly writing, the present chapter discusses the ways in which learner corpora can be used for the assessment of L2 proficiency, highlighting their benefits, but also some challenges linked to their use. Proposing a threefold distinction of practical applications of corpora ← 21 | 22 → in LTA, I will point out some major methodological issues in LCR as they pertain to learner corpus compilation, analysis, and their use in LTA. The chapter closes with an example of how learner corpora can increase transparency, consistency and comparability in the assessment of L2 writing proficiency using a data-driven approach that is partially independent of human rating.

2.   Learner Corpora in Language Testing and Assessment: Three Approaches

In Corpus Linguistics, there is a general distinction between corpus-based and corpus-driven approaches to the study of language2. The former is sometimes narrowly considered

a methodology that avails itself of the corpus mainly to expound, test or exemplify theories and descriptions that were formulated before large corpora became available to inform language study (Tognini-Bonelli 2001: 65).

From this point of view, a corpus is used as evidence to corroborate pre-existing linguistic description, e.g. as a source of examples to check researchers’ intuition or to examine the frequency of occurrence of a specific linguistic phenomenon. By contrast, corpus-driven approaches make minimal prior assumptions about language structure and are said to be more ‘inductive’, since the corpus itself is the data and the patterns of language use found in the corpus become the basis for defining regularities and exceptions in language. While this seemingly clear distinction is probably overstated (McEnery/Xiao/Tono 2006: 8), ‘corpus-based’ is the more general term of the two because it is often used in a much wider sense than the narrow definition given above, referring to any work that makes use of a corpus.

Callies/Diez-Bedmar/Zaytseva (2014) propose a threefold distinction of how learner corpora have been and can be applied in the field ← 22 | 23 → of LTA. They suggest three criteria to classify applications of learner corpora in LTA as CORPUS-INFORMED, CORPUS-BASED and CORPUS-DRIVEN: (1) the way corpus data are actually put to use, (2) the aims and outcomes for LTA, and (3) the degree of involvement of the researcher in data retrieval, analysis and interpretation. Callies/Díez-Bedmar/Zaytseva (2014) stress that these are not strict distinctions but that the three approaches may overlap or even merge in some practises.

In CORPUS-INFORMED applications, often the only type of approach discussed in most overview surveys of the use of learner corpora in LTA (e.g. Barker 2010, 2013), learner corpora are used to inform test content or to validate human raters’ claims (e.g. Alderson 1996; Barker 2010, 2013; Taylor/Barker 2008). In other words, “corpora can reveal what language learners can do, which informs both what is tested at a particular proficiency level and how this is rated” (Barker 2013: 1361). Barker (2013: 1360) also points out that “corpora and related techniques can be used throughout the cycle of planning, developing, delivering, and rating a language test”. She identifies three main ways in which corpora are used in LTA: defining user needs and test purpose, designing tests, and refining task rating. Barker (2013: 1361) also notes a “steady increase in language testers’ willingness to engage with corpora in relation to designing, validating and rating language tests” and that “corpora are being developed and used specifically for language testing and that these uses are being argued for and critiqued within the field rather than being adopted by all language testers” (Barker 2013: 1362).

In CORPUS-BASED approaches, corpus data are used to explore learner language, often – but not necessarily – comparing it to the language of native speakers (NS), in search of empirical evidence confirming or refuting a researcher’s hypothesis. Recently, researchers have turned to learner corpora to inform, validate, and advance the way proficiency is operationalized in the Common European Framework of Reference for Languages (CEFR; Council of Europe 2001). For example, the work by Hawkins/Filipovic (2012) on so-called “criterial features” in L2 English aims to make the information on proficiency levels more explicit by adding “grammatical and lexical details of English to CEFR’s functional characterisation of the different levels” (Hawkins/Filipovic 2012: 5). Their approach involves comparisons of particular ← 23 | 24 → linguistic features used by learners and NSs in two corpora: the Cambridge Learner Corpus (CLC) composed of exam scripts produced at different proficiency levels, and a reference corpus of NS English, the British National Corpus (BNC). Depending on similarities and differences in usage patterns across these corpora, linguistic features acquire the status of either positive or negative linguistic properties respectively, and are interpreted as criterial features that are “characteristic and indicative of L2 proficiency at each level, on the basis of which examiners make their practical assessments” (Hawkins/Filipovic 2012: 6). Similarly, another strand of research within LCR combines both the CEFR proficiency levels and Computer-aided Error Analysis to provide insights into those areas that are problematic at various proficiency levels to validate human ratings (e.g. Díez-Bedmar 2011; Thewissen 2013).

Finally, CORPUS-DRIVEN approaches presuppose the least degree of involvement by the researcher, since they rely on computer techniques for data extraction and evaluation. Here, ‘corpus-driven’ truly means ‘data-driven’ (Francis 1993: 139). The questions and conclusions formulated by a researcher will be derived from what the corpus data actually reveal when subjected to statistical analysis; the researcher does not approach the data influenced by a priori ideas and claims. An illustration of the corpus-driven approach in L2 proficiency assessment is Wulff/Gries’s (2011) proposal to measure accuracy through a probabilistic analysis of lexico-grammatical association patterns. Accuracy is operationalised as the proficient selection of constructions in their preferred constructional context in a particular target genre, and native-like proficiency is seen as a “gradual, probabilistic phenomenon that transcends a native-nonnative speaker divide” (Wulff/Gries 2011: 61). While such methods have not yet been widely used in LTA, this kind of approach seems particularly promising and useful for a “text-centred” (Carlsen 2012: 165), data-driven classification of proficiency based on linguistic descriptors, such as those that are typical of a specific register, e.g. academic writing (see Section 4). ← 24 | 25 →

3.   Methodological Challenges

Learner corpora have the potential to increase transparency, consistency and comparability in the assessment of L2 proficiency provided some methodological issues and challenges that pertain to their application in LTA can and will be tackled successfully. In this section, I discuss three issues and desiderata for future research:

Details

Pages
358
Publication Year
2016
ISBN (Softcover)
9783034315067
ISBN (PDF)
9783035107364
ISBN (MOBI)
9783035196405
ISBN (ePUB)
9783035196412
DOI
10.3726/978-3-0351-0736-4
Language
English
Publication date
2016 (January)
Keywords
corpus linguistics cross-sectional learner corpora language teaching and assessment corpus compilation and representativeness EFL longitudinal learner corpora error analysis learner corpora
Published
Bern, Berlin, Bruxelles, Frankfurt am Main, New York, Oxford, Wien, 2015. 358 pp.
Product Safety
Peter Lang Group AG

Biographical notes

Erik Castello (Volume editor) Katherine Ackerley (Volume editor) Francesca Coccetta (Volume editor)

Erik Castello and Katherine Ackerley are tenured researchers and lecturers in English language and linguistics at the University of Padua, Italy, while Francesca Coccetta is a researcher and lecturer in English language and linguistics at Ca’ Foscari University of Venice, Italy. Their research interests include corpus linguistics and computer and Internet technology for language teaching and assessment. They currently use computer learner corpora to inform both their research and teaching.

Previous

Title: Studies in Learner Corpus Linguistics