Show Less
Restricted access

Introduction to Many-Facet Rasch Measurement

Analyzing and Evaluating Rater-Mediated Assessments. 2nd Revised and Updated Edition


Thomas Eckes

Since the early days of performance assessment, human ratings have been subject to various forms of error and bias. Expert raters often come up with different ratings for the very same performance and it seems that assessment outcomes largely depend upon which raters happen to assign the rating. This book provides an introduction to many-facet Rasch measurement (MFRM), a psychometric approach that establishes a coherent framework for drawing reliable, valid, and fair inferences from rater-mediated assessments, thus answering the problem of fallible human ratings. Revised and updated throughout, the Second Edition includes a stronger focus on the Facets computer program, emphasizing the pivotal role that MFRM plays for validating the interpretations and uses of assessment outcomes.
Show Summary Details
Restricted access

4. Many-Facet Rasch Analysis: A First Look


4.   Many-Facet Rasch Analysis: A First Look

In this chapter, the basic MFRM modeling approach is explained using the essay rating data. First, various steps that need to be taken in preparation for a MFRM analysis are discussed. This includes formatting the input data and building a specification file. A key component of the results is a graphical display showing the joint calibration of examinees, raters, criteria, and the rating scale categories. Each part of this display is explained with respect to its substantive implications. Then, statistical indicators that summarize information on the variability within each facet are formally described and, in a subsequent section, applied to the sample data. The focus is on the different meanings these indicators may assume depending on the facet under consideration. The chapter concludes with a brief look at the contentious issue of global model fit.

4.1    Preparing for a many-facet Rasch analysis

To summarize briefly, the input data consisted of ratings that 18 raters awarded to essays written by 307 examinees in a live examination. Each essay was independently rated by at least two raters according to three criteria. Ratings were provided along the four-category TDN scale (with TDN levels scored from 2 to 5).

The data were analyzed by means of the computer program FACETS (Version 3.71; Linacre, 2014a). This program used the scores that raters awarded to examinees to estimate individual examinee proficiencies, rater severities, criterion difficulties, and scale category difficulties. FACETS accepts...

You are not authenticated to view the full text of this chapter or article.

This site requires a subscription or purchase to access the full text of books or journals.

Do you have any questions? Contact us.

Or login to access all content.