Show Less
Restricted access

Introduction to Many-Facet Rasch Measurement

Analyzing and Evaluating Rater-Mediated Assessments. 2nd Revised and Updated Edition


Thomas Eckes

Since the early days of performance assessment, human ratings have been subject to various forms of error and bias. Expert raters often come up with different ratings for the very same performance and it seems that assessment outcomes largely depend upon which raters happen to assign the rating. This book provides an introduction to many-facet Rasch measurement (MFRM), a psychometric approach that establishes a coherent framework for drawing reliable, valid, and fair inferences from rater-mediated assessments, thus answering the problem of fallible human ratings. Revised and updated throughout, the Second Edition includes a stronger focus on the Facets computer program, emphasizing the pivotal role that MFRM plays for validating the interpretations and uses of assessment outcomes.
Show Summary Details
Restricted access

5. A Closer Look at the Rater Facet: Telling Fact from Fiction


5.   A Closer Look at the Rater Facet: Telling Fact from Fiction

Instead of working on common ground raters often appear to vary considerably in terms of deeply ingrained, more or less idiosyncratic rating tendencies that threaten the validity of the assessment outcomes. This chapter addresses in detail the measurement implications of ineradicable rater variability, thus enabling researchers and assessment practitioners to separate facts about rater behavior from fallacious beliefs about raters functioning interchangeably. After pinpointing the precision of rater severity estimates, the focus shifts to the analysis of rater fit, including a detailed discussion of control limits for infit and outfit statistics. Further key issues of the present chapter refer to the study of central tendency and halo effects, and to the extent to which raters can be considered independent experts in the rating process. The chapter concludes with taking up again the topic of interrater agreement and reliability.

5.1    Rater measurement results

5.1.1  Estimates of rater severity

The preceding discussion has made it clear that the 18 raters under study differed greatly in their measures of severity. Let us now look at more detailed measurement results for each individual rater. Severity estimates, their precision, and other relevant statistics are presented in Table 5.1.

Table 5.1:  Measurement Results for the Rater Facet.

← 71 | 72 →

Note. MSW = mean-square infit statistic. tW = standardized infit statistic. MSU = mean-square outfit statistic. tU = standardized outfit statistic...

You are not authenticated to view the full text of this chapter or article.

This site requires a subscription or purchase to access the full text of books or journals.

Do you have any questions? Contact us.

Or login to access all content.