Learner corpus profiles

The case of Romanian Learner English

by Madalina Chitez (Author)
Monographs 244 Pages
Series: Linguistic Insights, Volume 173


Aiming at exemplifying the methodology of learner corpus profiling, this book describes salient features of Romanian Learner English. As a starting point, the volume offers a comprehensive presentation of the Romanian-English contrastive studies. Another innovative aspect of the book refers to the use of the first Romanian Corpus of Learner English, whose compilation is the object of a methodological discussion. In one of the main chapters, the book introduces the methodology of learner corpus profiling and compares it with existing approaches. The profiling approach is emphasised by corpus-based quantitative and qualitative investigations of Romanian Learner English. Part of the investigation is dedicated to the lexico-grammatical profiles of articles, prepositions and genitives. The frequency-based collocation analyses are integrated with error analyses and extended into error pattern samples. Furthermore, contrasting typical Romanian Learner English constructions with examples from the German and the Italian learner corpora opens the path to new contrastive interlanguage analyses.

Table Of Content

  • Cover
  • Title
  • Copyright
  • About the Author
  • About the Book
  • This eBook can be cited
  • Acknowledgements
  • Table of Contents
  • 1. Introduction
  • 1.1 Rethinking learner corpus profiles
  • 1.2 Why Romanian Learner English?
  • 1.3 The first Romanian Corpus of Learner English
  • 1.4 Some issues on ‘correctness’
  • 1.5 Outline of the book
  • 2. Romanian Learner English: research scope
  • 2.1 English as a Foreign Language in Romania
  • 2.2 Romanian Corpus of Learner English
  • 2.2.1 Compilation
  • 2.2.2 Problems and solutions
  • 2.2.3 Comparable corpora
  • 2.3 Tools
  • 2.4 Research strategy
  • 2.4.1 English as a Foreign Language
  • 2.4.2 Contrastive Analysis
  • 2.4.3 Error Analysis
  • 2.4.4 Computer Learner Corpora
  • 2.4.5 Contrastive Interlanguage Analysis
  • 2.5 From contrastive analysis to learner corpus approaches
  • 2.5.1 Phonetics and phonology
  • 2.5.2 Grammar
  • 2.5.3 Lexicology and Semantics
  • 2.5.4 Semiotics
  • 2.6 Potential and limitations of RECAP compared to RoCLE
  • 2.7 Other Romanian-English contrastive studies
  • 2.8 RoCLE versus ICLE
  • 2.9 RoCLE importance
  • 3. Learner corpus profiles
  • 3.1 Theoretical background
  • 3.2 Learner corpus profiles: a new definition
  • 3.3 Lexical Frequency Profile
  • 3.3.1 The teddy-bear argument
  • 3.3.2 Noun preferences
  • 3.3.3 Verb preferences
  • 3.4 Grammatical Frequency Profile
  • 3.5 The way to the Lexico-Grammatical Frequency Profile
  • 3.6 Conclusion
  • 4. Articles in Romanian Learner English
  • 4.1 Introduction
  • 4.2 The article system in English versus Romanian
  • 4.2.1 Article use in English: the and a/an
  • 4.2.2 Article use in Romanian from the EFL perspective
  • 4.2.3 Article-related anticipated problems in RoCLE
  • 4.2.4 Romanian Learner English in the CIA framework
  • 4.3 Lexico-grammatical profiles of the articles in RoCLE: main features
  • 4.3.1 Basic frequencies
  • 4.3.2 The definite article in RoCLE: frequency, collocations, clusters
  • 4.3.3 The indefinite article in RoCLE: frequency, collocations, clusters
  • 4.4 Overuse and underuse of articles in Romanian Learner English
  • 4.4.1 Specific-reference with ‘the’
  • 4.4.2 Superlatives and numerals with ‘the’
  • 4.4.3 New information after indefinite articles
  • 4.4.4 Indefinite article use after the <be> paradigm
  • 4.4.5 Indefinite articles after ‘what a’, ‘such a’
  • 4.5 Specific pattern-related errors
  • 4.5.1 The of-construction with definite articles
  • 4.5.2 ‘The’ with proper nouns
  • 4.6 Learner corpora in contrast
  • 4.6.1 ‘The-flooding’ or ‘L2-leaking’?
  • 4.6.2 Reference to something already mentioned: ‘the’ or ‘this’?
  • 4.6.3 From ‘Şi…-a/l/i/le’ to ‘And the…’
  • 4.7 Conclusion
  • 5. Prepositions in Romanian Learner English
  • 5.1 Introduction
  • 5.1.1 Prepositions in EFL
  • 5.1.2 Prepositions in learner corpora
  • 5.2 Frequency analysis
  • 5.2.1 Simple prepositions
  • 5.2.2 Simple prepositions: grammatical profiles
  • 5.2.3 Complex prepositions
  • 5.2.4 Complex prepositions: grammatical profiles
  • 5.3 A parallel study: prepositions in RECAP and RoCLE
  • 5.4 Multiple-function prepositions in RoCLE
  • 5.4.1 The use of IN in RoCLE
  • 5.4.2 The use of ON in RoCLE
  • 5.4.3 The use of TO in RoCLE
  • 5.4.4 The use of WITH in RoCLE
  • 5.4.5 The use of AT in RoCLE
  • 5.4.6 The use of BY in RoCLE
  • 5.5 Contrastive Interlanguage Analysis: L1-specific or universal errors?
  • 5.5.1 Article annihilation by prepositions
  • 5.5.2 Universal confusions
  • 5.6 Conclusion
  • 6. Genitives in Romanian Learner English
  • 6.1 Introduction
  • 6.2 Contrastive analysis of the genitive system in Romanian and English
  • 6.3 The s-genitive constructions in RoCLE
  • 6.3.1 Automatic analysis of the s-genitives in RoCLE: technical problems
  • 6.4 The of-genitive constructions in RoCLE
  • 6.4.1 Post- and and pre-modifers of the of-genitive in RoCLE
  • 6.4.2 Basic of-genitive functions in RoCLE
  • 6.5 Other types of genitive constructions in RoCLE
  • 6.6 Genitives in learner corpora
  • 6.6.1 Overall frequencies
  • 6.7 Conclusion
  • 7. Summary
  • 7.1 Major achievements and findings
  • 7.2 The learner corpus profile: summarizing discussion
  • 7.2.1 The Lexical Frequency Profile
  • 7.2.2 The Grammatical Frequency Profile
  • 7.2.3 The Lexico-Grammatical Frequency Profiles
  • 7.2.4 The Contrastive Interlanguage Analysis Profiles
  • 7.3 Romanian Learner English: back to ‘classroom’
  • 7.4 Further research
  • References


1.1Rethinking learner corpus profiles

A rich body of literature supports the idea that computer learner corpora facilitate both authentic-data descriptions of specific interlanguage types and their pedagogical implications (Granger 1998). In our view, the major purposes of CLC research can be more easily accomplished when a lexico-grammatical profile (O’Keeffe, McCarthy et al. 2007) of the corpus has already been determined. Subsequently, the interdependence between a freshly compiled learner corpus, like the Romanian Corpus of Learner English (RoCLE) and the learner corpus profiling approach seems inevitable.

Nevertheless, defining a learner corpus profile proves to be a most challenging task, as it can simultaneously refer to an integrated type of investigation, i.e. multiple features of word categories within the corpus (Granger and Rayson 1998), and to particular analyses, i.e. multiple features of specific items within the corpus (Tognini-Bonelli 2001). The terminology used by Crystal (1991) associates ‘profiling’ with the identification of the most salient features in a particular person (clinical linguistics) or register (stylistics). Starting from there, there is the possibility of constructing general profiles of the corpus according to the focus components: grammatical, lexical, semantic etc. The sum of all these separate ‘profiles’ can generate a learner corpus profile.

The other option is to create a lexico-grammatical profile of a word/word-category and its typical “contexts of use” (O’Keeffe, McCarthy et al. 2007: 14) in connection to: collocates, chunks/idioms, syntactic restrictions, semantic restrictions, prosody etc. The chosen formula in the present study is a fair balance of both profiling strategies. In order to avoid a terminological debate, we have decided to use ‘learner corpus profile’ in free variation with ‘lexico ← 11 | 12 → grammatical profile’, because the lexico-grammatical aspects are the core elements of our learner-corpus-profile approach.

On the other hand, the domain of lexico-grammar itself is debatable. It emerges as a result of the “intriguing evidence of the interrelatedness of vocabulary and syntax” provided by “researchers in corpus linguistics and neighbouring fields” (Römer and Schulze 2009: 1). The detected interconnection has developed in the frame of lexico-grammar approaches which centre round cross-category topics such as collocation (Sinclair 1991), colligation (Sinclair 1996), pattern grammar (Hunston and Francis 2000), lexical bundle (Biber 2004), collostruction (Stefanowitsch and Gries 2003) etc. They all refer to clusters/patterns/chunks which can be indentified and analysed on the basis of both lexical and grammatical (morpho-syntactico-semantic) distinctions.

In interlanguage varieties, the lexico-grammatical approach is highly relevant since it connects almost automatically to the EAP phraseology field. Even if the RoCLE analysis does not necessarily focus on typical EAP phrases, significant collocates/clusters inventories of the selected grammatical categories (articles, prepositions, genitives) will be provided. The CLC lexico-grammar research is thereby interconnected, enclosing not only learner-corpus lexical/grammatical frequency profiles but also learner-corpus lexico-grammatical category profiles. In addition, “the lexis-grammar interface” (Römer and Schulze 2009) generates noticeable phraseology-related results. Considering that most EAP investigations can be categorized according to their reference to idiomatic, half-idiomatic and simple collocational phrases, the study of Romanian Learner English, can be associated with the last two categories. By quantifying and interpreting the use of problematic syntactic elements and their co-dependent clusters/collocations, the ultimate aim of the RoCLE-based findings is to contribute to the improvement of the Romanian students’ written productions in English.

Such a holistic approach to learner corpus research roughly implies the detection and extraction of the essential features of the corpus. Opposite to the general lexical and grammatical frequency profiles, which are exclusively based on lexical respectively POS-category frequencies, the detailed profiles of selected grammar classes ← 12 | 13 → include lexico-grammatical quantitative and qualitative characteristics. In fact, the creation of the lexico-grammatical profile for an entire grammatical class (e.g. articles) is utterly innovative: it should fill the existing gap in profiling research categories, ranging from entirely automatic lexical corpus profiles (Laufer 2005) or grammatical corpus profiles (Granger and Rayson 1998) to semi-automatic target-cluster profiling (Garretson 2009) or specific word profiling (O’Keeffe, McCarthy et al. 2007).

1.2Why Romanian Learner English?

Romanian Learner English is a complex and relatively new domain considering that English has been extensively taught and learned in Romania only in the last fifteen to twenty years. Its importance has been gradually growing due to, in our opinion, the education and work market demands, directly influenced by globalization tendencies throughout the word. In an incessantly changing educational system and society in general, it is, however, rather difficult to evaluate the characteristics and status of a specific interlanguage type, like Romanian Learner English.

In addition, the already existing studies approached the topic of learning English by Romanian students almost exclusively from the Romanian-English Contrastive Analysis view (see Chiţoran 1975). In spite of their attempts to contribute to the English teaching methodology, these studies have merely addressed the RLE topic from a prescriptive standpoint:

…results are therefore extremely significant both for the orientation of English-Romanian contrastive research work, and for the share which certain drills must have in practice, even with advanced years (or, in some cases, especially with advanced years), drills concerning mastering of English syntax in its subtlest nuances, with a special stress on syntagms (the verb group, the noun group and the phrase), which for native Romanian students raise problems of selecting the elements which can be combined, as well as their order in collocation. (Slama-Cazacu and Dutescu-Coliban 1977: 278) ← 13 | 14 →

In other words, the major findings of the Romanian ‘old school’ EFL studies referred to the identification of the problematic grammar categories and the most significant Romanian-English interference cases. However, quantitative evidence for the described phenomena was not collected, with the exception of scarce small-scale experimental studies. The direct comparison with native speakers’ productions was also completely missing. That is why Romanian Learner English problems have been, until now, restricted to theoretical studies. The tendency has been more or less reflected in the teaching methodology and materials in Romanian classrooms, which led to the establishment of somehow obsolete EFL norms. In this context, the RoCLE approach, based on learner language data, has to identify and select the ‘real’ RLE problems, which are not so much textbook-related (i.e. English in the classroom) as usage dependent.

Indeed, a closer look at the RoCLE texts suggests that the complexity of the Romanian Learner English cannot be reduced to lists of interlanguage errors, grammar rules or false friends:

To sum up there is no system that is perfect but [if university systems would mind] <1> [the societies demand] <2> and [place its emphasis on] <3> useful skills, then surely a degree would be [more valuable that] <4> [is actually considered] <5>. However I believe that too many young people leave university without having the necessary qualifications to find a job. A student should firstly make sure that the university degree is of use to him in the outside world. [By this I mean] <6> that universities should place greater emphasis on vocational skills, which could be readily adapted for use in a job. [As a result people would be still educated, but their qualification would be suitable to allow them to embark on a career] <7>. (<ROCLE-AIC-0017.4>)

Table 1. RLE specific samples in RoCLE.

From the extract above only, some important features can be determined: (a) certain grammar rules are quite frequently disregarded, which may lead to interpretable (tense use in conditional sentences in <1>) or serious errors (genitive form in <2>, conjunction confusion in <4>, subject omission in <5>); (b) some of the expressions/phrases used by the students mirror Romanian expressions and, even if they are correct, they are used in a different context and more frequently than by native students (En. place its emphasis on > Ro. a ← 14 | 15 → pune acccentul pe in <3>); (c) students use spoken-register markers in their writing (<6>); (d) some formulations are either ambiguous or ‘artificial’ NNS constructions (<7>).

Considering the above exemplifications (random analysis), some questions arise with regard to the relevant possibilities of investigation offered by RoCLE. In the case of incorrect structures, ambiguities and register inappropriacy, an error-tagged version of the corpus would be required. But since this is not available, the other remaining option is the combination of EA case-studies (NS and NNS corrections) and sample frequency analyses. In the end, this is the only way to correctly assess the Romanian-English interference instances because a native-speaker error correction would most likely not be able to distinguish between Romanian-learner typical errors and universal learner errors.


ISBN (Softcover)
Publication date
2014 (August)
Bern, Berlin, Bruxelles, Frankfurt am Main, New York, Oxford, Wien, 2014. 244 pp., num. fig.

Biographical notes

Madalina Chitez (Author)

Madalina Chitez holds a doctoral degree in English Linguistics from the University of Freiburg, Germany. She obtained her bachelor’s degree in foreign languages from Transilvania University of Brasov, Romania. She currently works as a senior researcher at Zurich University of Applied Sciences in Switzeland. Her research interests are corpus linguistics, learner corpora, academic writing and intercultural rhetorics.


Title: Learner corpus profiles