Show Less

Corpus Data across Languages and Disciplines

Series:

Edited By Piotr Pezik

Over the recent years corpus tools and methodologies have gained widespread recognition in various areas of theoretical and applied linguistics. Data lodged in corpora is explored and exploited across languages and disciplines as distinct as historical linguistics, language didactics, discourse analysis, machine translation and search engine development to name but a few. This volume contains a selection of papers presented at the 8 th edition of the Practical Applications in Language and Computers conference and it is aimed at helping a wide community of researchers, language professionals and practitioners keep up to date with new corpus theories and methodologies as well as language-related applications of computational tools and resources.

Prices

Show Summary Details
Restricted access

Towards the PELCRA Learner English Corpus: Piotr Pęzik

Extract

Towards the PELCRA Learner English Corpus Piotr Pęzik Abstract The PELCRA Learner English Corpus (PLEC) is a research project funded by a grant from the Polish Ministry of Science and Higher Education. Launched in December 2010 the project is aimed at compiling an annotated corpus featuring a time-aligned spoken component for quantitative and qualitative analyses of Polish learner English. This paper describes the general design of the corpus including the different layers of linguistic and error annotation. Special emphasis is placed on data exploration and analysis methods for investigating the phraseological competence of Polish learners of English. The paper also introduces some exploratory corpus mining methods such as lexicogrammatical pattern analysis and graph-based visualization of learners’ phraseological competence Keywords Learner corpora, English, Polish, data exploration, text mining, error annotation Introduction The PELCRA Polish Learner English Corpus (PLEC) is a research project funded by a grant from the Polish Ministry of Science and Higher Education. The project was launched in December 2010 and it aims to compile a linguistically annotated corpus featuring a time-aligned spoken component for quantitative and qualitative analyses of Polish learner English. The present paper describes the design and contents of the corpus as well as the different layers of linguistic and error annotation. The PLEC project has a number of research objectives which can be pursued as the annotated corpus of learner English develops. In addition to using standard techniques of corpus data analysis at the phonetic, lexical and syntactic levels, which rely on explicit error...

You are not authenticated to view the full text of this chapter or article.

This site requires a subscription or purchase to access the full text of books or journals.

Do you have any questions? Contact us.

Or login to access all content.