Show Less
Restricted access

Introduction to Many-Facet Rasch Measurement

Analyzing and Evaluating Rater-Mediated Assessments. 2nd Revised and Updated Edition


Thomas Eckes

Since the early days of performance assessment, human ratings have been subject to various forms of error and bias. Expert raters often come up with different ratings for the very same performance and it seems that assessment outcomes largely depend upon which raters happen to assign the rating. This book provides an introduction to many-facet Rasch measurement (MFRM), a psychometric approach that establishes a coherent framework for drawing reliable, valid, and fair inferences from rater-mediated assessments, thus answering the problem of fallible human ratings. Revised and updated throughout, the Second Edition includes a stronger focus on the Facets computer program, emphasizing the pivotal role that MFRM plays for validating the interpretations and uses of assessment outcomes.
Show Summary Details
Restricted access

6. Analyzing the Examinee Facet: From Ratings to Fair Scores


6.   Analyzing the Examinee Facet: From Ratings to Fair Scores

Observed scores in most instances should not be taken at face value. This is especially important when high-stakes decisions about examinees are involved. The present chapter deals with measurement results for the examinee facet, focusing on ways to ensure fairness of the assessment under conditions of substantial rater variability. Building again on the sample data, the chapter first illustrates the use and interpretation of examinee fit statistics and then goes on to elaborate on the adjustment of observed scores in order to compensate for between-rater severity differences. It is argued that adjusted or fair scores more dependably reflect examinee proficiency, and do so both at the aggregate score level and at the level of individual criterion ratings.

6.1    Examinee measurement results

As shown in the Wright map (Figure 4.1), there was a wide variation in examinees’ writing proficiency. The estimates of proficiency measures had a 14.93-logit spread, which was more than three times larger than the logit spread observed for rater severity estimates (4.64 logits). Moreover, separation statistics confirmed that the examinees were well-differentiated according to their level of writing proficiency: The examinee separation or strata index was 4.55, with an examinee separation reliability of .91 (see Table 4.2).

In addition to the statistics presented earlier, FACETS provided a statistic testing the hypothesis that the examinee proficiency measures were a random sample from a normal distribution. This statistic took the form of...

You are not authenticated to view the full text of this chapter or article.

This site requires a subscription or purchase to access the full text of books or journals.

Do you have any questions? Contact us.

Or login to access all content.