I. Methodological issues in corpus linguistic analyses of variability
Embracing Bayes Factors for key item analysis in corpus linguistics Andrew Wilson Introduction The key item methodology is one of the most widely used tools in modern cor- pus linguistics. Its goal is to highlight those lexical items – or other linguistic constructs such as part-of-speech categories or semantic fields – which are most distinctive of one text or corpus when compared against another. In other words, it sets out to identify the main elements of variability between two (or some- times more) varieties, authors, texts, etc. When used in relation to lexical items, it is more commonly known as the keywords methodology. However, this methodology is not without its difficulties. For instance, the uneven dispersions of items across parts of a text or corpus (Leech, Rayson & Wilson 2001; Gries 2008) and the actual magnitudes of any frequency differ- ences that are discovered (Gries 2005) have both been highlighted as complicat- ing factors in interpreting the results of key item analyses. In this short paper, I should like to focus on a more basic misunderstanding in relation to the key item methodology and on one possible solution to it. Misunderstanding key items Although it has been given some aura of novelty by the use of terms such as "keyness" in certain software implementations, the key item methodology is ac- tually nothing more than an ordinary null hypothesis significance test applied to the frequencies of words or other items in two texts or corpora. Most commonly, the underlying test is based on a...
You are not authenticated to view the full text of this chapter or article.
This site requires a subscription or purchase to access the full text of books or journals.
Do you have any questions? Contact us.Or login to access all content.