Assessing Interactional Competence
Principles, Test Development and Validation through an L2 Chinese IC Test
Summary
In this superb, meticulously designed, intellectually coherent book based on awardwinning scholarship, David Wei Dai takes the reader on a riveting journey tackling key challenges in assessing Interactional Competence. Ingenious and groundbreaking; there is no looking back.
(Talia Isaacs, University College London)
David Wei Dai’s book is an exemplary study of test development and validation. It breaks new ground in the assessment of Interactional Competence, and is an invaluable resource for novices and seasoned researchers alike.
(Carsten Roever, University of Melbourne)
Excerpt
Table Of Contents
- Cover
- Title
- Copyright
- About the author
- About the book
- This eBook can be cited
- Dedication
- Preface
- Foreward
- Summary of the Book
- Summary of the Book in Chinese
- Acknowledgements
- Table of Contents
- List of Tables
- List of Figures
- List of abbreviations
- Chapter 1 Introduction
- Chapter 2 Literature review
- 2.1 A philosophical account of interaction
- 2.1.1 Interaction and pragmatics
- 2.1.2 An intentionalist perspective on interaction
- 2.1.3 A rationalist-utilitarian perspective on interaction
- 2.1.4 An empiricist-interactional perspective on interaction
- 2.1.5 A unified account of interaction for assessment
- 2.2 Interaction in computer-mediated communication
- 2.2.1 CMC and L2-speaker interaction
- 2.2.2 An empiricist-interactional approach to CMC
- 2.2.3 Five CMC considerations for test design
- 2.3 Defining an IC construct: A theoretical discussion
- 2.3.1 A brief history of IC
- 2.3.2 Assessing IC
- 2.3.3 Differentiating speaking/LC and talking/IC
- 2.3.4 Strong on speaking/LC but weak on talking/IC
- 2.3.5 Strong on talking/IC but weak on speaking/LC
- 2.4 Defining an IC construct: An operational discussion
- 2.4.1 Are we measuring talking/IC or speaking/LC?
- 2.4.2 Separating IC from LC
- 2.4.3 Going beyond the mechanics of interaction: Hymes and Goffman revisited
- 2.4.4 Emotional, logical and moral IC markers
- 2.4.5 Aristotelian artistic proofs: Pathos, logos, and ethos
- 2.4.6 Membership categorization analysis: Categorial IC markers
- 2.5 Designing IC test tasks
- 2.5.1 From the target language domain to a test
- 2.5.2 Task-based needs analysis
- 2.5.3 Triangulation in needs analysis
- 2.5.4 Paucity of TBNA in L2 Chinese
- 2.6 Designing IC rating materials
- 2.6.1 IC rating materials development
- 2.6.2 The rater perspective and indigenous criteria
- 2.6.3 Test-taker exemplars in IC rating
- Chapter 3 Interpretive argument and research design
- 3.1 The inferences and assumptions in the interpretive argument
- 3.1.1 The domain description inference
- 3.1.2 The evaluation inference
- 3.1.3 The generalization inference
- 3.1.4 The explanation inference
- 3.1.5 The extrapolation inference
- 3.2 The design of the three studies
- 3.2.1 Study one, relevant assumptions and research questions
- 3.2.2 Study two, relevant assumptions and research questions
- 3.2.3 Study three, relevant assumptions and research questions
- Chapter 4 Study one: Task-based needs analysis and test design
- 4.1 Methodology of study one
- 4.1.1 Participants
- 4.1.1.1 TBNA participants
- 4.1.1.2 Test design participants
- Item review and moderation participants
- Norming session participants
- 4.1.2 Instruments
- 4.1.2.1 TBNA instruments
- Hermeneutic-Socratic interviews
- Longitudinal reflective diaries
- 4.1.2.2 Test design instruments
- Norming questionnaires
- 4.1.3 Procedures
- 4.1.3.1 TBNA procedure
- 4.1.3.2 Test design procedure
- 4.1.4 Data analysis
- 4.1.4.1 TBNA data analysis
- 4.1.4.2 Test design data analysis
- 4.2 Results and initial discussion of study one
- 4.2.1 TBNA results
- 4.2.1.1 Social actions
- 4.2.1.2 Sociopragmatic and pragmalinguistic issues
- 4.2.1.3 Interactional features and content knowledge
- 4.2.1.4 Linguistic issues and multimodal cues
- 4.2.2 The test specifications
- 4.2.3 Generating draft items
- 4.2.4 Revising the draft items
- 4.2.5 Finalizing the IC test
- Chapter 5 Study two: Pilot test, indigenous criteria, and rating materials
- 5.1 Methodology of study two
- 5.1.1 Participants
- 5.1.1.1 Pilot test test-takers
- 5.1.1.2 Pilot test raters
- 5.1.1.3 Everyday-life domain experts
- 5.1.2 Instruments
- 5.1.3 Procedures and data analysis
- 5.1.3.1 Pilot testing
- 5.1.3.2 Eliciting DEs’ indigenous IC criteria
- 5.1.3.3 Developing a DEs’ indigenous IC criteria rating scale
- 5.1.3.4 Theoretically expanding the IC rating scale
- 5.2 Results and initial discussion of study two
- 5.2.1 Pilot test findings
- 5.2.2 Domain experts’ indigenous IC criteria
- 5.2.2.1 Conflict management
- 5.2.2.2 Solidarity promotion
- 5.2.2.3 Reasoning skills
- 5.2.2.4 Personal qualities
- 5.2.2.5 Social relations
- 5.2.2.6 Linguistic choices
- 5.2.2.7 Prosodic features
- 5.2.2.8 The structure of talk
- 5.2.2.9 Strategies, cultural norms, and miscellaneous
- 5.2.3 An indigenous IC rating scale
- 5.2.3.1 Collapsing indigenous criteria into five rating categories
- 5.2.3.2 Identifying steps in the rating categories
- 5.2.3.3 Identifying sub rating categories and extracting descriptors
- 5.2.3.4 Indigenous rating category: Conflict management
- 5.2.3.5 Indigenous rating category: Solidarity promotion
- 5.2.3.6 Indigenous rating category: Personal qualities
- 5.2.3.7 Indigenous rating category: Reasoning skills
- 5.2.3.8 Indigenous rating category: Social relations
- 5.2.4 CA and MCA validation and the generation of exemplars
- 5.2.4.1 The rationale behind the CA and MCA validation of the scale
- 5.2.4.2 The sample test task and the pilot test test-takers selected
- 5.2.4.3 Theorizing conflict management and social relations
- 5.2.4.4 Theorizing solidarity promotion and reasoning skills
- 5.2.4.5 Theorizing personal qualities
- 5.2.4.6 Address terms in social role management
- 5.2.4.7 Categories and predicates
- 5.2.4.8 Beginner L2-speakers’ category knowledge
- 5.2.4.9 The power of categorization
- 5.2.5 A theorized IC rating scale
- 5.2.5.1 Theorized rating category: Disaffiliation control
- 5.2.5.2 Theorized rating category: Affiliation promotion
- 5.2.5.3 Theorized rating category: Morality
- 5.2.5.4 Theorized rating category: Reasoning
- 5.2.5.5 Theorized rating category: Social role management
- 5.2.6 A unified model of IC
- Chapter 6 Study three: The IC test and accompanying questionnaires
- 6.1 Methodology
- 6.1.1 Participants
- 6.1.1.1 Main testing test-takers
- 6.1.1.2 Main testing test-taker peers
- 6.1.1.3 Main testing IC test raters
- 6.1.2 Instruments
- 6.1.2.1 The IC test
- 6.1.2.2 Test-taker background questionnaires
- 6.1.2.3 Self and peer-assessment questionnaires
- 6.1.2.4 Rater training materials
- 6.1.3 Procedures
- 6.1.3.1 Administering the IC test and questionnaires
- 6.1.3.2 Training raters
- 6.1.3.3 Rater rating
- 6.1.4 Data analysis
- 6.2 Results and initial discussion
- 6.2.1 Rasch analyses of IC test scores
- 6.2.1.1 The Wright map
- 6.2.1.2 The candidate measurement report
- 6.2.1.3 The rater measurement report
- 6.2.1.4 The criterion measurement report
- 6.2.1.5 The item measurement report
- 6.2.1.6 The rating scale category functioning
- 6.2.1.7 The dimensionality of the data structure
- 6.2.2 Correlation between IC and LC
- 6.2.3 Rasch analyses of questionnaires
- 6.2.3.1 The disaffiliation control sub-section
- 6.2.3.2 The affiliation promotion sub-section
- 6.2.3.3 The morality sub-section
- 6.2.3.4 The reasoning sub-section
- 6.2.3.5 The social role management sub-section
- 6.2.3.6 Overall results of self and peer IC questionnaires
- 6.2.4 Correlation between the IC test and questionnaires
- 6.2.5 Rasch analyses of extrapolation and attitude items
- 6.2.5.1 Explicit extrapolation questions
- 6.2.5.2 Test-taker attitude questions
- Chapter 7 Validity argument and overall discussions
- 7.1 The domain description inference
- 7.1.1 Domain description assumption 1
- 7.1.2 Domain description assumption 2
- 7.1.3 Domain description assumption 3
- 7.1.4 Domain description assumption 4
- 7.2 The evaluation inference
- 7.2.1 Evaluation assumption 1
- 7.2.2 Evaluation assumption 2
- 7.2.3 Evaluation assumption 3
- 7.2.4 Evaluation assumption 4
- 7.3 The generalization inference
- 7.3.1 Generalization assumption 1
- 7.3.2 Generalization assumption 2
- 7.3.3 Generalization assumption 3
- 7.3.4 Generalization assumption 4
- 7.4 The explanation inference
- 7.4.1 Explanation assumption 1
- 7.4.2 Explanation assumption 2
- 7.4.3 Explanation assumption 3
- 7.4.4 Explanation assumption 4
- 7.4.5 Explanation assumption 5
- 7.5 The extrapolation inference
- 7.5.1 Extrapolation assumption 1
- 7.5.2 Extrapolation assumption 2
- 7.5.3 Extrapolation assumption 3
- 7.6 Considerations outside the validity framework
- 7.6.1 CMC and practicality
- 7.6.2 Stakeholder take-up and assessment literacy
- 7.6.3 Building a universal model of IC
- 7.6.4 Application of the IC construct and rating scale
- 7.6.5 The parameters of the IC tasks
- Chapter 8 Conclusions
- 8.1 Significance of this book
- 8.2 Outstanding issues, limitations, and future research
- References
- Appendix I: S-H interview protocol
- Appendix II: Norming questionnaire
- English translation
- Chinese version
- Appendix III: The IC test
- Appendix IV: The IC rating scale
- English version
- Chinese version
- Appendix V: The self-assessment questionnaire
- English version
- Chinese version
- Appendix VI: The peer-assessment questionnaire
- Author Information
- Series index
Chapter 1 Introduction
The assessment of second language (L2) speaking ability has long followed the psycholinguistic-individualist approach that focuses on a test-taker’s ability to produce speech and assesses this ability based on linguistic components such as lexical range, grammatical accuracy, and pronunciation. This approach is adopted in most major language tests (e.g., IELTS and TOEFL) but is not without controversies. One of the main issues language assessment specialists takes with the psycholinguistic-individualist approach is that it under-represents the social, interactional reality of speaking (Roever & Kasper, 2018). Speaking in real life most frequently takes the form of talking, which is interaction with other people. If this key feature is underrepresented in speaking tests, scores from the speaking tests cannot provide reliable information to inform end-users about how test-takers actually perform in real life, threatening the validity of the speaking tests and the legitimacy of the testing practice.
The assessment of interaction, or more specifically, a speaker’s interactional competence (IC), questions the current practice of speaking assessment and concurrently raises new challenges. The conceptualization of IC draws heavily from the field of Conversation Analysis (CA) on everyday interaction. However, CA is empiricist in its analytic approach and does not evoke etic standards in its depiction of interaction. This is misaligned with the practice of assessment in which test-taker performances need to be benchmarked against standards. This philosophical tension between CA and testing has long troubled IC assessment researchers and needs to be acknowledged and addressed before IC assessment can proceed.
In terms of the current scope of IC assessment research, previous studies have largely focused on L2-English as a target language and face-to-face (F2F) communication as a medium of assessment (Ikeda, 2021; Youn, 2015). More research is needed in other languages to enrich our understanding of IC assessment. The F2F mode of test delivery, despite its authenticity, highlights the logistic and practicality challenges of IC assessment as it can be expensive to create F2F interactional settings for test-takers to interact with interlocutors in order to assess their IC. The COVID-19 global pandemic further prompts test developers to rethink the practice of language assessment and explore alternative platforms for test delivery that do not require F2F interaction. More research is needed to see if IC assessment can be conducted in the online space to lower the cost and make language assessment operationalizable when F2F assessment is not feasible.
Since IC assessment is still relatively new in the field of L2 speaking assessment, IC as a test construct is under-specified and there is no consensus on how IC should be best defined. Previous research on IC development and assessment has focused on various indicators of IC, but the choice of the IC indicators is either based on researchers’ judgement or what is available in the performance data produced by L2 speakers in specific language tasks. The IC indicators investigated so far are mostly mechanistic markers that index the sequence of interaction. Interaction itself, however, encompasses a much broader range of features that are yet to be theorized in IC assessment. A methodical approach is needed to identify IC tasks that are most relevant to the target L2 group. Based on the identified tasks, IC researchers can further explore what IC indicators are most crucial to success in interaction and theorize these indicators into a more comprehensive IC test construct.
A related research gap in IC assessment research, and in language assessment in general, is that testing specialists have frequently relied on language experts to define the construct of a language test. This can unintentionally weaken the extrapolation inference of a language test to real-world situations as in real life, test-takers’ language skills are frequently judged and assessed by non-testing specialists, instead of language teachers or applied linguists (Sato & McNamara, 2019). This issue is particularly pronounced when IC is the trait to be assessed. The ability to interact successfully and appropriately in general everyday-life settings is a skill that everyday-life members in any society can master and need to master. It is not a specialized skill in which only language specialists can claim expertise. To better understand and define IC as a test construct, it can be beneficial to engage with non-testing experts’ opinions to understand how IC is understood in everyday-life settings and attained by everyday-life members of a society.
The extrapolative/predictive strength of IC assessment also awaits more investigation. The main function of a language test is to show what is assessed in a testing setting can inform end-users of test-takers’ language ability in non-testing settings. IC assessment promises better extrapolation to real-world performance compared to traditional psycholinguistic-individualist speaking tests because IC tests assess test-takers’ ability to interact in real-life settings. Such a claim, however, needs to be validated. If an IC test cannot be shown to measure how test-takers behave in everyday real-world situations, it will threaten the inferences we can draw from the test.
Finally, the most contentious point of IC assessment is its relationship with the assessment of linguistic competence (LC, encompassing indicators such as grammar, vocabulary, and pronunciation) as defined in traditional psycholinguistic-individualist speaking frameworks. Is assessing IC the same as assessing LC? If the answer is affirmative, the next logical question is why there is a need for assessing IC since the assessment of LC has long been established and perfected. If the answer is negative, the question that ensues is how IC and LC are different. This is a complex question, and more research is needed to assist us with unpacking the relationship between the two constructs.
In view of the above-mentioned research gaps identified in IC assessment, this book sets out to address these concerns to further our understanding of the practice and value of IC assessment. Chapter 2 offers a literature review of existing IC assessment research and related issues. I provide an exploration of the philosophical underpinnings behind IC assessment, the alternatives to F2F assessment, the relationship between IC and LC, and the practices in test design and rating materials development. An evaluation of existing literature on these issues helps to lay the groundwork for the development of the IC test in this book.
Chapter 3 details the validation framework used in this book to guide the validation of the IC test. In this book, Kane’s argument-based framework is used and the assumptions and inferences behind the interpretive argument are detailed in Chapter 3. The project is designed to gather backings to support the assumptions in the framework and the backings will be evaluated in Chapter 7 once all the data are gathered and analysed.
Between Chapter 3 and Chapter 7 is the main focus of this book, which consists of three studies in Chapter 4, Chapter 5, and Chapter 6 that are designed to validate the IC test. Chapter 4 explains the process undertaken in the development of the IC test. Chapter 5 focuses on the design of the rating materials and the specification of the IC construct for the test. Chapter 6 is the main testing study where the test was administered to a large cohort of test-takers to examine how the test and the rating scale function. These three studies generate the evidence needed to validate the test, which is evaluated in the validity argument in Chapter 7.
The last chapter, Chapter 8, summarizes the main findings and contributions from this book, discusses the limitations of this research project, and suggests directions for future research.
Details
- Pages
- 446
- Publication Year
- 2024
- ISBN (PDF)
- 9783631885857
- ISBN (ePUB)
- 9783631885864
- ISBN (Hardcover)
- 9783631882504
- DOI
- 10.3726/b21295
- Open Access
- CC-BY
- Language
- English
- Publication date
- 2024 (June)
- Keywords
- Conversation Analysis Membership Categorization Many-Facet Rasch Computer-mediated communication Pragmatics & Sociopragmatics Pragmalinguistics Language testing Language assessment Social interaction Validity argument
- Published
- Berlin, Bruxelles, Chennai, Lausanne, New York, Oxford, 2024. 446 pp., 31 fig. b/w, 69 tables.