Edited By Piotr Pezik
CzeSL – an Error Tagged Corpus of Czech as a Second Language: Barbora Štindlová, Alexandr Rosen, Jirka Hana and Svatava Škodová
CzeSL – an Error Tagged Corpus of Czech as a Second Language Barbora Štindlová, Alexandr Rosen, Jirka Hana and Svatava Škodová Abstract Using an error-annotated learner corpus as the basis, the goal of this paper is two-fold: (i) to evaluate the practicality of the annotation scheme by computing inter-annotator agreement on a non-trivial sample of data, and (ii) to find out whether the application of automated linguistic annotation tools (taggers, spell checkers and grammar checkers) on the learner text is viable as a substitute for manual annotation. Keywords Learner corpus, error annotation, second language acquisition Introduction Texts produced by non-native speakers are a precious source of information about the acquisition of a language by the learners and about second language acquisition in general. Collections of such texts – learner corpora – can be annotated in a way similar to other corpora with morphosyntactic categories or syntactic structure. However, their most interesting aspect is examples of deviant use, which can be corrected and assigned a tag specifying the type of error. Annotation of this kind is a challenging task, even more so for a language such as Czech, with its rich inflection, derivation, agreement, and a largely information-structure-driven constituent order. The present work is based on a project aimed at building a learner corpus with errors manually corrected and labelled within a three-level annotation scheme. Manual annotation is supplemented by morphosyntactic tags assigned to the hand- corrected input by a tagger, and by additional error tags, whenever they can be derived automatically. Options to...
You are not authenticated to view the full text of this chapter or article.
This site requires a subscription or purchase to access the full text of books or journals.
Do you have any questions? Contact us.Or login to access all content.