Studies in Learner Corpus Linguistics
Research and Applications for Foreign Language Teaching and Assessment
Summary
Excerpt
Table Of Contents
- Cover
- Title
- Copyright
- About the author
- About the book
- This eBook can be cited
- Contents
- Introduction
- Section 1: Learner Corpora for Language Teaching and Assessment
- Using Learner Corpora in Language Testing and Assessment: Current Practice and Future Challenges
- Dealing with Errors in Learner Corpora to Describe, Teach and Assess EFL Writing: Focus on Article Use
- Using Learner Corpora to Order Linguistic Structures in Terms of Apparent Difficulty
- Focus on Form in Computer-Mediated Communication: Using Written Learner Data to Foster Language and Pragmatic Skills in Communicative Contexts
- The Compilation and Use of a CMC Learner Corpus for Japanese University Students
- Section 2: Longitudinal Learner Corpus-based Studies
- Introduction to the LONGDALE Project
- Nouns and Noun Phrases in Advanced Dutch EFL Writing: From Quantitative to Qualitative Longitudinal Data Analysis
- I didn’t really *understood what it was about, but it really *made fun: A Longitudinal Corpus-based Study of Tense/Aspect and High-frequency Verbs in Learner English
- Assessing Advanced EFL Students’ Proficiency at Producing Affect-laden Discourse
- Towards a Longitudinal Study of Metadiscourse in EFL Academic Writing: Focus on Italian Learners’ Use of it-extraposition
- Short-term Effects of Students’ Exploration of Corpora: A Longitudinal Study of Pre- and Post-modification of Noun Phrases in Learner English
- Section 3: Language Corpora for the Analysis of Interlanguage
- Analysing the Language of Interpersonal Relations in Corpora of Elicited Learner and Native Speaker Interactions in English
- From Learner to Expert: Using a Corpus to Analyse the Use of Must by German Advanced Students of English
- Spanish Copulas and the Interlanguage of Iraqi University Students
- Phraseology in Academic L2 Discourse: The Use of Multi-words Units in a CMC University Context
- Section 4: Learner Corpus Compilation and Representativeness
- Representing Learner English in a Specialized Corpus: Genre and Proficiency Level in the Advanced Learner Corpus of Argumentative Student Essays (ALCASE)
- Connecting Data Elicitation and Pedagogical Practice in Learner Corpus Design: The Case of TILCE – the Turin Italian Learner Corpus of English
- A Generic Data Workflow for Building Annotated Text Corpora
- Notes on Contributors
Section 1:
Learner Corpora for Language Teaching and Assessment
Using Learner Corpora in Language Testing and Assessment: Current Practice and Future Challenges1
1. Introduction
Learner Corpus Research (LCR) has contributed significantly to the description of advanced interlanguages, and many of its findings have resulted in useful applications for second/foreign (L2) language teaching and learning. Learner- and native-speaker corpora are also receiving increasing attention in the field of Language Testing and Assessment (LTA) to approach the construct of L2 proficiency. LTA as a subfield within Applied Linguistics subsumes a vast area of different assessment and testing contexts, in which the terms ‘testing’ and ‘assessment’ are often used interchangeably. The two terms both refer to “the systematic gathering of language-related behavior in order to make inferences about language ability and capacity for language use on other occasions” (Chapelle/Plakans 2013: 241). However, the term ‘assessment’ is often used to refer to a more varied process of data gathering and interpretation than ‘testing’, which applies to assessment practices in institutional contexts (Chapelle/Plakans 2013: 241). Focusing on productive skills, mainly writing, the present chapter discusses the ways in which learner corpora can be used for the assessment of L2 proficiency, highlighting their benefits, but also some challenges linked to their use. Proposing a threefold distinction of practical applications of corpora ← 21 | 22 → in LTA, I will point out some major methodological issues in LCR as they pertain to learner corpus compilation, analysis, and their use in LTA. The chapter closes with an example of how learner corpora can increase transparency, consistency and comparability in the assessment of L2 writing proficiency using a data-driven approach that is partially independent of human rating.
2. Learner Corpora in Language Testing and Assessment: Three Approaches
In Corpus Linguistics, there is a general distinction between corpus-based and corpus-driven approaches to the study of language2. The former is sometimes narrowly considered
a methodology that avails itself of the corpus mainly to expound, test or exemplify theories and descriptions that were formulated before large corpora became available to inform language study (Tognini-Bonelli 2001: 65).
From this point of view, a corpus is used as evidence to corroborate pre-existing linguistic description, e.g. as a source of examples to check researchers’ intuition or to examine the frequency of occurrence of a specific linguistic phenomenon. By contrast, corpus-driven approaches make minimal prior assumptions about language structure and are said to be more ‘inductive’, since the corpus itself is the data and the patterns of language use found in the corpus become the basis for defining regularities and exceptions in language. While this seemingly clear distinction is probably overstated (McEnery/Xiao/Tono 2006: 8), ‘corpus-based’ is the more general term of the two because it is often used in a much wider sense than the narrow definition given above, referring to any work that makes use of a corpus.
Callies/Diez-Bedmar/Zaytseva (2014) propose a threefold distinction of how learner corpora have been and can be applied in the field ← 22 | 23 → of LTA. They suggest three criteria to classify applications of learner corpora in LTA as CORPUS-INFORMED, CORPUS-BASED and CORPUS-DRIVEN: (1) the way corpus data are actually put to use, (2) the aims and outcomes for LTA, and (3) the degree of involvement of the researcher in data retrieval, analysis and interpretation. Callies/Díez-Bedmar/Zaytseva (2014) stress that these are not strict distinctions but that the three approaches may overlap or even merge in some practises.
In CORPUS-INFORMED applications, often the only type of approach discussed in most overview surveys of the use of learner corpora in LTA (e.g. Barker 2010, 2013), learner corpora are used to inform test content or to validate human raters’ claims (e.g. Alderson 1996; Barker 2010, 2013; Taylor/Barker 2008). In other words, “corpora can reveal what language learners can do, which informs both what is tested at a particular proficiency level and how this is rated” (Barker 2013: 1361). Barker (2013: 1360) also points out that “corpora and related techniques can be used throughout the cycle of planning, developing, delivering, and rating a language test”. She identifies three main ways in which corpora are used in LTA: defining user needs and test purpose, designing tests, and refining task rating. Barker (2013: 1361) also notes a “steady increase in language testers’ willingness to engage with corpora in relation to designing, validating and rating language tests” and that “corpora are being developed and used specifically for language testing and that these uses are being argued for and critiqued within the field rather than being adopted by all language testers” (Barker 2013: 1362).
In CORPUS-BASED approaches, corpus data are used to explore learner language, often – but not necessarily – comparing it to the language of native speakers (NS), in search of empirical evidence confirming or refuting a researcher’s hypothesis. Recently, researchers have turned to learner corpora to inform, validate, and advance the way proficiency is operationalized in the Common European Framework of Reference for Languages (CEFR; Council of Europe 2001). For example, the work by Hawkins/Filipovic (2012) on so-called “criterial features” in L2 English aims to make the information on proficiency levels more explicit by adding “grammatical and lexical details of English to CEFR’s functional characterisation of the different levels” (Hawkins/Filipovic 2012: 5). Their approach involves comparisons of particular ← 23 | 24 → linguistic features used by learners and NSs in two corpora: the Cambridge Learner Corpus (CLC) composed of exam scripts produced at different proficiency levels, and a reference corpus of NS English, the British National Corpus (BNC). Depending on similarities and differences in usage patterns across these corpora, linguistic features acquire the status of either positive or negative linguistic properties respectively, and are interpreted as criterial features that are “characteristic and indicative of L2 proficiency at each level, on the basis of which examiners make their practical assessments” (Hawkins/Filipovic 2012: 6). Similarly, another strand of research within LCR combines both the CEFR proficiency levels and Computer-aided Error Analysis to provide insights into those areas that are problematic at various proficiency levels to validate human ratings (e.g. Díez-Bedmar 2011; Thewissen 2013).
Finally, CORPUS-DRIVEN approaches presuppose the least degree of involvement by the researcher, since they rely on computer techniques for data extraction and evaluation. Here, ‘corpus-driven’ truly means ‘data-driven’ (Francis 1993: 139). The questions and conclusions formulated by a researcher will be derived from what the corpus data actually reveal when subjected to statistical analysis; the researcher does not approach the data influenced by a priori ideas and claims. An illustration of the corpus-driven approach in L2 proficiency assessment is Wulff/Gries’s (2011) proposal to measure accuracy through a probabilistic analysis of lexico-grammatical association patterns. Accuracy is operationalised as the proficient selection of constructions in their preferred constructional context in a particular target genre, and native-like proficiency is seen as a “gradual, probabilistic phenomenon that transcends a native-nonnative speaker divide” (Wulff/Gries 2011: 61). While such methods have not yet been widely used in LTA, this kind of approach seems particularly promising and useful for a “text-centred” (Carlsen 2012: 165), data-driven classification of proficiency based on linguistic descriptors, such as those that are typical of a specific register, e.g. academic writing (see Section 4). ← 24 | 25 →
3. Methodological Challenges
Learner corpora have the potential to increase transparency, consistency and comparability in the assessment of L2 proficiency provided some methodological issues and challenges that pertain to their application in LTA can and will be tackled successfully. In this section, I discuss three issues and desiderata for future research:
Details
- Pages
- 358
- Publication Year
- 2016
- ISBN (Softcover)
- 9783034315067
- ISBN (PDF)
- 9783035107364
- ISBN (MOBI)
- 9783035196405
- ISBN (ePUB)
- 9783035196412
- DOI
- 10.3726/978-3-0351-0736-4
- Language
- English
- Publication date
- 2016 (January)
- Keywords
- corpus linguistics cross-sectional learner corpora language teaching and assessment corpus compilation and representativeness EFL longitudinal learner corpora error analysis learner corpora
- Published
- Bern, Berlin, Bruxelles, Frankfurt am Main, New York, Oxford, Wien, 2015. 358 pp.
- Product Safety
- Peter Lang Group AG