Show Less

Task Equivalence in Speaking Tests

Investigating the Difficulty of Two Spoken Narrative Tasks


Chihiro Inoue

This book addresses the issue of task equivalence, which is of fundamental importance in the areas of language testing and task-based research, where task equivalence is a prerequisite. The main study examines the two ‘seemingly-equivalent’ picture-based spoken narrative tasks, using a multi-method approach combining quantitative and qualitative methodologies with MFRM analysis of the ratings, the analysis of linguistic performances by Japanese candidates and native speakers of English (NS), expert judgements of the task characteristics, and perceptions of the candidates and NS. The results reveal a complex picture with a number of variables involved in ensuring task equivalence, raising relevant issues regarding the theories of task complexity and the commonly-used linguistic variables for examining learner spoken language. This book has important implications for the possible measures that can be taken to avoid selecting non-equivalent tasks for research and teaching.


Show Summary Details
Restricted access

6. Discussion 185


6. Discussion 6.1 Task Difficulty according to MFRM Analysis The results for RQ1 indicate that the difficulty of Tasks A and B was significantly different according to FACETS, both in the overall Con- sidered Judgement (CJ) ratings and in the ratings of the five rating categories. Although the difference in task difficulty is small (.40 and .52, respectively), its effect was demonstrated as possibly being cru- cial for part of the candidate population. This section further explores how these differences might be interpreted. In a previous study to examine the equivalence of monologic tasks, Weir and Wu (2006) found a .74 logit difference between the two description tasks on Forms 2 and 3 of the test under investigation. Despite this large significant difference in logit values, they concluded that these two tasks were actually equivalent, arguing that the fair average values were identical; there was only a .03 difference on a 5- point rating scale. In their investigation, Weir and Wu cite a comment by Linacre, from their personal correspondence, whereby the large logit difference found for a very small difference in raw score (i.e. fair average) probably resulted from a very small rater disagreement, and that these differences are actually very trivial (Weir & Wu, 2006: 186). In Research Question 1 of my main study, the fair average val- ues of CJ ratings for Tasks A and B were 5.67 and 5.91, respectively, on a 10-point rating scale (i.e. Below A1 to C1). Similarly, those for the five rating criteria...

You are not authenticated to view the full text of this chapter or article.

This site requires a subscription or purchase to access the full text of books or journals.

Do you have any questions? Contact us.

Or login to access all content.