Show Less
Restricted access

Introduction to Many-Facet Rasch Measurement

Analyzing and Evaluating Rater-Mediated Assessments- 2 nd Revised and Updated Edition

Series:

Thomas Eckes

Since the early days of performance assessment, human ratings have been subject to various forms of error and bias. Expert raters often come up with different ratings for the very same performance and it seems that assessment outcomes largely depend upon which raters happen to assign the rating. This book provides an introduction to many-facet Rasch measurement (MFRM), a psychometric approach that establishes a coherent framework for drawing reliable, valid, and fair inferences from rater-mediated assessments, thus answering the problem of fallible human ratings. Revised and updated throughout, the Second Edition includes a stronger focus on the Facets computer program, emphasizing the pivotal role that MFRM plays for validating the interpretations and uses of assessment outcomes.
Show Summary Details
Restricted access

9. Special Issues

Extract

9.  Special Issues

The MFRM approach to rater-mediated assessment raises a number of more specialized issues, some of which concern the design of collecting many-facet data; others relate to benefits that accrue from conducting a MFRM analysis in terms of facilitating detailed rater feedback or informing standard setting. This chapter deals with design issues first, since they figure prominently in any kind of many-facet data analysis. Then, some of the practical benefits a MFRM approach holds for providing feedback to raters and for evaluating judgments gathered in the context of standard-setting studies are discussed. A more technical section highlights key differences between the MFRM approach and CTT-based generalizability theory (G-theory). The chapter concludes with a brief description of computer software suited to implement MFRM models or various kinds of model extensions.

9.1  Rating designs

In rater-mediated assessment, great care needs to be taken concerning the design according to which the rating data are collected. For example, when raters provide scores for the performances of examinees on a number of tasks, questions like these may arise: Should all raters score all examinees, or would it be sufficient if subsets of raters each scored a particular subset of examinees? What is a reasonable number of raters per examinee, how many examinees should each rater score, and should each rater score examinee performance on each task? With only a few raters scoring a subset of examinees, how should raters be assigned to examinees in order to make...

You are not authenticated to view the full text of this chapter or article.

This site requires a subscription or purchase to access the full text of books or journals.

Do you have any questions? Contact us.

Or login to access all content.