Validating Analytic Rating Scales

Name: Validating Analytic Rating Scales
Price: 88.00 CHF
Availability: InStock
Author: Armin Berger
ISBN: 978-3-653-96042-6

Berger, Armin

Validating Analytic Rating Scales

A Multi-Method Approach to Scaling Descriptors for Assessing Academic Speaking

by Armin Berger (Author)

Linguistics

Series: Language Testing and Evaluation, Volume 37

Summary

This book presents a unique inter-university scale development project, with a focus on the validation of two new rating scales for the assessment of academic presentations and interactions. The use of rating scales for performance assessment has increased considerably in educational contexts, but the empirical research to investigate the effectiveness of such scales is scarce. The author reports on a multi-method study designed to scale the level descriptors on the basis of expert judgments and performance data. The salient characteristics of the scale levels offer a specification of academic speaking, adding concrete details to the reference levels of the Common European Framework. The findings suggest that validation procedures should be mapped onto theoretical models of performance assessment.

Excerpt

Cover
Title
Copyright
About the Author
About the Book
This eBook can be cited
Table of Contents
Acknowledgements
List of figures
List of tables
List of abbreviations
1 Introduction
1.1 Background to the study
1.2 Statement of the problem
1.3 Purpose of the study
1.4 Research questions
1.5 Structure of the book
2 Performance assessment of second language speaking
2.1 Introduction to performance assessment
2.2 The speaking construct in performance assessment
2.2.1 Pre-communicative approaches
2.2.2 Models of communicative competence
2.2.3 Approaches to speaking
2.3 Models of performance assessment
2.3.1 McNamara (1996)
2.3.2 Skehan (1998, 2001)
2.3.3 Bachman (2002)
2.3.4 Fulcher (2003)
2.4 Rating scales in performance assessment
3 Rating scales
3.1 General characteristics
3.2 Types of rating scales
3.3 Theoretical and methodological concepts in rating scale development
3.3.1 Intuitive approaches
3.3.2 Theory-based approaches
3.3.3 Empirical approaches
3.3.4 Triangulation of approaches
3.4 Controversy over rating scales
4 Rating scale validation
4.1 Validity and validity evidence
4.2 Rasch-based rating scale validation
4.3 Dimensionality
4.4 Conclusion
5 The ELTT rating scales
5.1 The development process
5.1.1 Intuitive phase
5.1.2 Qualitative phase
5.2 The ELTT construct
5.2.1 Lexico-grammatical resources and fluency
5.2.2 Pronunciation and vocal impact
5.2.3 Structure and content
5.2.4 Genre-specific presentation skills: formal presentations
5.2.5 Content and relevance (interaction)
5.2.6 Interaction
5.3 Descriptor formulation
5.4 ELTT speaking ability
5.5 Conclusion
6 Descriptor sorting
6.1 Validating the ELTT scales
6.2 Rationale
6.3 Methodology
6.3.1 Participants
6.3.2 Instruments and procedures
6.4 Analysis
6.5 Results and discussion
6.5.1 Inter-rater reliability
6.5.2 Match between intended and empirical scale
6.5.3 Descriptor analysis
6.6 Preliminary conclusions
6.6.1 Level allocation
6.6.2 Specificity of proficiency levels
6.6.3 Descriptor wording
6.6.4 Recommendations for scale revision
6.7 Conclusion
7 Descriptor calibration
7.1 Rationale
7.2 Analysis
7.2.1 Rasch measurement
7.2.2 Specification of a measurement model and FACETS output
7.2.3 Measurement quality control
7.2.4 Descriptor analysis
7.3 Results and discussion
7.3.1 Measurement quality control
7.3.2 Dimensionality of descriptors
7.3.3 The proficiency continuum
7.3.4 Cut-off points and content integrity
7.4 Conclusion
8 Descriptor-performance matching
8.1 Rationale
8.2 Methodology
8.2.1 Participants
8.2.2 Instruments and procedures
8.2.3 Data collection
8.3 Analysis
8.3.1 Specification of a measurement model
8.3.2 Measurement quality control
8.4 Results and discussion
8.4.1 Measurement quality control
8.4.2 Dimensionality of descriptors
8.4.3 The proficiency continuum
8.4.4 Cut-off points and content integrity
8.5 Conclusion
8.6 Comparison of methods
9 Revision of the ELTT scales
9.1 Establishing a quality hierarchy of descriptor units
9.2 The quality of descriptor units
9.3 Constructing the revised scales
9.4 Common points of reference
9.5 The modified versions of the ELTT scales
10 Conclusion
10.1 Summary
10.2 Theoretical implications
10.3 Practical recommendations
10.4 Limitations of the study
10.5 Suggestions for further research
10.6 Concluding statement
11 References
12 Appendix
12.1 Appendix 1: Original ELTT rating scales
12.2 Appendix 2: Sorting task questionnaire
12.3 Appendix 3: Consensual scales based on descriptor sorting
12.4 Appendix 4: Descriptor unit measurement report (descriptor calibration)
12.5 Appendix 5: All facet vertical ruler (sorting task)
12.6 Appendix 6: Speaking tasks
12.7 Appendix 7: Rating sheets
12.8 Appendix 8: Rater guidelines
12.9 Appendix 9: Student measurement report (descriptor-performance matching)
12.10 Appendix 10: All facets vertical ruler (descriptor-performance matching)
12.11 Appendix 11: Descriptor unit measurement report (descriptor-performance matching)

Acknowledgements

I would like to express my sincere gratitude to all those – far too numerous to mention here – who supported me during my academic journey. In particular, I wish to thank Christiane Dalton-Puffer, Günther Sigott, Tim McNamara, Charles Alderson, Ari Huhta, Rita Green, and Hermann Cesnik for the opportunity to discuss my work with them. Their insightful, instructive, and wholly useful feedback helped me shape this research. The responsibility for any errors or inadequacies that may occur in this work, of course, is entirely my own.

Thank you for sharing your great expertise!

Furthermore, I would like to express my gratitude to the members of the ELTT group who developed the two analytic rating scales I was fortunate enough to investigate: Martina Elicker, Helen Heaney, Martin Kaltenbacher, Gunther Kaltenböck, Thomas Martinek, and Benjamin Wright. Working with them has been an enjoyable and educational experience.

Thank you for your commitment to professionalism!

I am deeply indebted to my colleagues who participated as raters in the project: Nancy Campbell, Lucy Cripps, Dianne Davies, Grit Frommann, Meta Gartner-Schwarz, Anthony Hall, Helen Heaney, Claire Jones, Katharina Jurovsky, Gunther Kaltenböck, Christina Laurer, Sandra Pelzmann, Michael Phillips, Horst Prillinger, Karin Richter, Angelika Rieder-Bünemann, Jennifer Schumm Fauster, Gillian Schwarz-Peaker, Nicholas Scott, Susanne Sweeney-Novak, Andreas Weissenbäck, and Sarah Zehentner. I greatly appreciate their willingness to share their expertise and devote time – often enormous amounts – to the project for nothing but sincere gratitude in return.

Thank you for your academic idealism!

I would also like to thank all our students who generously consented to take part in the study. The spectacle of a mock exam and the doubtful privilege of being able to consider themselves participants in a study was a poor reward for real motivation and great service.

Thank you for your academic curiosity! ← 9 | 10 →

On a personal note, I am extremely fortunate to have had the wholehearted love and support of my family and friends. It was their patience and understanding that helped me manage to juggle a full-time teaching job, a research project, and many other professional activities. Words cannot describe the gratitude I feel towards my wife, Angela, who is the greatest source of inspiration in my life, bar none.

Sorry for not always having my priorities right! ← 10 | 11 →

List of figures

Figure 1: Components of language competence (Bachman 1990: 87)

Figure 2: Components of language competence (Bachman & Palmer 1996: 63)

Figure 3: Levelt’s blueprint for the speaker (Levelt 1989: 9)

Figure 4: A summary of oral skills (Bygate 1987: 50)

Figure 5: Variables influencing performance in a speaking test (McNamara 1996: 86)

Figure 6: Skehan’s (1998: 172) model of oral test performance

Figure 7: Bachman’s (2002: 467) expanded model of oral test performance

Figure 8: Fulcher’s (2003: 115) expanded model of speaking testperformance

Figure 9: A framework for describing approaches to rating scaledevelopment

Figure 10: Messick’s (1989: 20) facets of validity

Figure 11: Facets of rating scale validity (Knoch 2009: 65)

Figure 12: The ELTT scale development process

Figure 13: The ELTT model of speaking ability

Figure 14: Scale category probability curves (descriptor sorting)

Figure 15: Task specifications

Figure 16: Scale category probability curves (descriptor-performance matching)

Figure 17: Classification instrument for assessing descriptor unit quality

Figure 18: Common reference points and descriptor keywords

Figure 19: An expanded model of performance assessment, based on Fulcher (2003) and Knoch (2009)

Figure 20: An expanded model for rating scale development ← 11 | 12 → ← 12 | 13 →

List of tables

Table 1:	Inter-rater reliability statistics
Table 2:	Discriminant analysis: classification results
Table 3:	Discriminant analysis: classification results according to scale criteria
Table 4:	Unilevel descriptor units with agreement figures of < 60 % in the sorting task
Table 5:	Multi-level descriptor units with agreement figures of > 60 % in the sorting task
Table 6:	Rater measurement report (descriptor sorting)
Table 7:	Criterion measurement report (descriptor sorting)
Table 8:	Category statistics (descriptor sorting)
Table 9:	Misfitting LGF descriptor units (descriptor calibration)
Table 10:	Unexpected calibrations within lexico-grammatical resources and fluency (descriptor calibration)
Table 11:	Unexpected calibrations within pronunciation and vocal impact (descriptor calibration)
Table 12:	Unexpected calibrations within structure and content (descriptor calibration)
Table 13:	Unexpected calibrations within content and relevance (descriptor calibration)
Table 14:	Synopsis of calibrated descriptor components: LGF (descriptor calibration)
Table 15:	Synopsis of calibrated descriptor components: PVI (descriptor calibration)
Table 16:	Synopsis of calibrated descriptor components: PSCW (descriptor calibration)
Table 17:	Synopsis of calibrated descriptor components: PGSP (descriptor calibration)
Table 18:	Synopsis of calibrated descriptor components: ICRW (descriptor calibration)
Table 19:	Synopsis of calibrated descriptor components: IINH (descriptor calibration)
Table 20:	Number of videotaped speaking performances
Table 21:	Rater measurement report (descriptor-performance matching)
Table 22:	Criterion measurement report (descriptor-performancematching)
Table 23:	Category statistics (descriptor-performance matching) ← 13 \| 14 →
Table 24:	Misfitting LGF descriptor units (descriptor-performance matching)
Table 25:	Synopsis of calibrated descriptor components: LGF (descriptor-performance matching)
Table 26:	Synopsis of calibrated descriptor components: PVI (descriptor-performance matching)
Table 27:	Synopsis of calibrated descriptor components: PSCW (descriptor-performance matching)
Table 28:	Synopsis of calibrated descriptor components: PGSP (descriptor-performance matching)
Table 29:	Synopsis of calibrated descriptor components: ICRW (descriptor-performance matching)
Table 30:	Synopsis of calibrated descriptor components: IINH (descriptor-performance matching)
Table 31:	Consistency and consensus indices of measures and band allocations
Table 32:	Illustrative quality classifications
Table 33:	Distribution of descriptor unit quality
Table 34:	ELTT descriptor units of excellent quality
Table 35:	The ELTT presentation scale after reintegrating the most stable descriptor units
Table 36:	The ELTT interaction scale after reintegrating the most stable descriptor units
Table 37:	Descriptor units added for adequate construct representation
Table 38:	Presentation scale
Table 39:

List of abbreviations

Details

Pages: 395
Publication Year: 2016
ISBN (Hardcover): 9783631666913
ISBN (PDF): 9783653061833
ISBN (MOBI): 9783653960419
ISBN (ePUB): 9783653960426
DOI: 10.3726/978-3-653-06183-3
Language: English
Publication date: 2015 (December)
Keywords: Language testing Language assessment Assessing speaking Performance assessment
Published: Frankfurt am Main, Berlin, Bern, Bruxelles, New York, Oxford, Wien, 2015. 395 pp., 39 tables
Product Safety: Peter Lang Group AG

Biographical notes

Armin Berger (Author)

Armin Berger is a Senior Lecturer in English as a Foreign Language in the English Department at the University of Vienna. His main research interests are in the areas of teaching and assessing speaking, rater behaviour, language assessment literacy, and foreign language teacher education.

Validating Analytic Rating Scales

Summary

Excerpt

Table Of Contents

Details

Biographical notes

Key Subject Areas