Show Less
Restricted access

Validating Analytic Rating Scales

A Multi-Method Approach to Scaling Descriptors for Assessing Academic Speaking


Armin Berger

This book presents a unique inter-university scale development project, with a focus on the validation of two new rating scales for the assessment of academic presentations and interactions. The use of rating scales for performance assessment has increased considerably in educational contexts, but the empirical research to investigate the effectiveness of such scales is scarce. The author reports on a multi-method study designed to scale the level descriptors on the basis of expert judgments and performance data. The salient characteristics of the scale levels offer a specification of academic speaking, adding concrete details to the reference levels of the Common European Framework. The findings suggest that validation procedures should be mapped onto theoretical models of performance assessment.
Show Summary Details
Restricted access

4 Rating scale validation


4   Rating scale validation

4.1   Validity and validity evidence

The notion of validity and the corresponding validation processes have changed over time. While in the 1960s validity was seen as given if the test measured what it was supposed to measure and produced reliable test results across administrations, in the 1970s validity was discussed in relation to a number of test features that were assumed to contribute to validity. Accordingly, different types of validity were distinguished, including the classical trio of ‘criterion-oriented’, ‘content’ and ‘construct validity’. Each type of validity is related to the type of evidence presented to show that a test is valid. While criterion-oriented validity aims to demonstrate the relationship between a test and a criterion, either in the form of another test deemed valid or the performance on some future criterion, content validity is given if the content of a test is characteristic of the domain that is being tested. Construct validity indicates to what extent a test is representative of an underlying theory of language use and to what extent that theory can explain or predict the performance on a test. All these aspects were usually investigated separately and in isolation. It was not until Messick’s (1989) seminal article on validity that the general perspective changed from a divided to a unitary view of validity, which also takes account of the consequences of a test. He defines validity as

an integrated evaluative judgement of the degree to which empirical...

You are not authenticated to view the full text of this chapter or article.

This site requires a subscription or purchase to access the full text of books or journals.

Do you have any questions? Contact us.

Or login to access all content.