The Development of a Common Framework Scale of Language Proficiency


Brian North

Scales describing language proficiency in a series of levels can provide orientation for educational programmes, criteria for assessment, and reporting to stakeholders. However, in most cases such instruments are produced just by expert opinion. A scale of language proficiency actually implies a descriptive scheme related to theory but usable by practitioners. It also implies a methodology for scaling content to different levels. This book describes the use of both qualitative and quantitative techniques to develop scales for the «Common Reference Levels» in the Common European Framework of Reference for modern languages. Short stand-alone descriptors were (i) developed and classified, (ii) refined and elaborated in workshops, and then (iii) scaled by analyzing the judgments of one hundred teachers on the English language proficiency of the learners in their classes.


7. Interpreting the Scale 271


7 Interpreting the Scale 212 descriptors had now been calibrated to estimated difficulties on a common logit scale running from -5.68 to 4.68. Having successfully con- structed a scale of items, the next step was to investigate it firstly in order to check that it indeed made sense, that similar content was calibrated in a co- herent fashion, and secondly in order to present it in a form in which it could be meaningful to other users. In effect this meant dividing the scale up into a number of bands or levels, which in turn involved setting cut-off points, and then seeing (a) whether those levels had coherent content (b) whether progress up the scale in each category was logical. Setting Cut-offs between Levels The number of levels or strata which can be identified in a set of data is connected to the question of reliability. Pollitt explains a calculation with Table 7.1: Reliability and the Number of Strata in Data r Pollitt 1991 Fisher 1992 Bands Distinct Strata 0.98 9 0.97 8 0.96 10.1 7 0.94 5 0.90 6.3 4 0.80 4.3 3 0.70 2 0.50 1 which one can derive the decision capability from any test (or rating) from its reliability coefficient (Pollitt 1991: 90). Fisher (1992) offers similar in- formation and his table and Pollitt's compare as in Table 7.1. 272 The Development of a Common Framework S.-ale oJLmguage Profldenry The reliability statistic for the full integrated analysis (simulated Cronbach Alpha) was 0.97, which, according to Pollitt...

