Variability in assessor responses to undergraduate essays

An issue for assessment quality in higher education

by Sally Roisin O'Hagan (Author)
©2015 Monographs 322 Pages


Academic standards in higher education depend on the judgements of individual academics assessing student work; it is in these micro-level practices that the validity and fairness of assessment is constituted. However, the quality of assessments of open-ended tasks like the coursework essay is difficult to ascertain because of the complex and subjective nature of the judgements involved. In view of current concerns about assessment quality and standards, this book is a timely reflection on the practices of academic judgement at university. It explores assessment quality through an empirical study of essay marking in an undergraduate discipline where large class sizes and significant numbers of second language students are common. The study shows that assessors vary in their interpretations of criteria and standards and that this results in inconsistent grading of essays. The book contributes to a growing scholarship of assessment with an evidence-based explanation of why assessors disagree and a discussion of the implications of this for the validity of assessment practices at university.

Table Of Contents

  • Cover
  • Title
  • Copyright
  • About the Author
  • About the Book
  • This eBook can be cited
  • Contents
  • Preface
  • 1. Introduction
  • 1.1 Background to the research
  • 1.2 Aim of the study and research approach
  • 1.3 Overview of this book
  • 2. A review of the literature on writing assessment
  • 2.1 Empirical research on assessment in higher education
  • 2.1.1 Reliability of marks awarded at university
  • 2.1.2 Variability in assessment approaches at university
  • 2.2 Responses to assessment issues in higher education
  • 2.2.1 Institutional policy and student diversity
  • 2.2.2 ‘Good practice’ guidelines for assessment
  • 2.3 The nature of assessment practice at university
  • 2.3.1 The implicit expertise of professional practice
  • 2.3.2 The pedagogic context of university assessment
  • 2.4 Influences on judgements of writing quality
  • 2.4.1 Judgements of disciplinary writing at university
  • 2.4.2 The rating of L1/L2 English composition and ESL writing
  • 2.5 Research approaches in writing assessment
  • 2.5.1 Limitations of survey and ‘text manipulation’ studies
  • 2.5.2 Process-oriented research approaches
  • 2.6 Verbal report methodologies
  • 2.6.1 Background
  • 2.6.2 Advantages of verbal reports
  • 2.6.3 Methodological concerns about verbal reports
  • 2.6.4 Further considerations for research design
  • 3. Method
  • 3.1 Research questions
  • 3.2 Pilot study
  • 3.2.1 Participants
  • 3.2.2 Interview protocol
  • 3.2.3 Main findings of pilot study
  • 3.3 Trial verbal report study
  • 3.3.1 Participants
  • 3.3.2 Materials and procedures
  • 3.3.3 Post-sessional interviews
  • 3.3.4 Findings of trial verbal report study
  • 3.4 Main study
  • 3.4.1 The research site
  • 3.4.2 Participants
  • 3.4.3 Materials and procedures
  • The essays
  • The marking sessions
  • Verbal reporting procedures
  • Post-sessional interviews
  • 3.4.4 Data – treatment and analysis
  • Essay marks
  • Verbal reports – preparation of data
  • Verbal reports – coding
  • Verbal reports – coding protocol
  • Identifying comments
  • Repeated, elaborated, and embedded comments
  • Responses to researcher prompts
  • Assessor annotations
  • Positive and negative comments
  • Verbal reports – further analysis
  • Analysis of post-sessional interview data
  • 4. Results
  • 4.1 Essay marks
  • 4.1.1 Marks by language group
  • 4.1.2 Marks by essay
  • 4.1.3 Marks by assessor
  • 4.2 Verbal commentary
  • 4.2.1 Distribution of positive and negative comments
  • 4.2.2 Comment categories
  • 4.3 Verbal commentary: sub-categories
  • 4.3.1 Evaluations of content
  • 4.3.2 Evaluations of expression
  • 4.3.3 Evaluations of research
  • 4.3.4 Evaluations of structure
  • 4.3.5 Evaluations of presentation
  • 4.3.6 Other evaluations
  • 4.3.7 Axial coding
  • 4.4 Assessor disagreements
  • 4.4.1 Disagreements in essay marks
  • 4.4.2 Disagreements in commentary
  • Variable understandings of content criteria
  • Research – how many citations, and what kind?
  • Structure – ‘impressive’ or ‘confusing’
  • Conflicting evaluations of quality of expression
  • 5. Discussion
  • 5.1 The findings
  • 5.2 The implications
  • 5.3 Reasons for disagreement
  • 5.4 Solutions
  • 5.5 Concluding remarks
  • 6. Conclusion
  • 6.1 Summary of findings
  • 6.2 Limitations of the study
  • 6.3 Future directions for assessment reform
  • Bibliography
  • Appendix A: Verbal report characteristics
  • Appendix B: Instructions to assessors
  • ‘Verbal reports’
  • Appendix C: Coding sub-categories
  • C.1 Evaluations of Content
  • C.2 Evaluations of Expression
  • C.3 Evaluations of Research
  • C.4 Evaluations of Structure
  • C.5 Evaluations of Presentation
  • C.6 Other evaluations
  • C.7 Axial codes
  • Appendix D: Comment distributions by essay
  • D.1 Distribution of negative and positive comments
  • D.2 Comment categories
  • D.3 Post hoc analyses of chi-square test of independence: comment categories
  • Appendix E: Comment distributions by assessor
  • E.1 Distribution of negative and positive comments
  • E.2 Chi-square analysis (ratio of positive to negative comments)
  • E.3 Comment categories
  • E.4 Post hoc analyses of chi-square tests: comment categories
  • Appendix F: Post hoc analysis for chi-square tests of independence
  • F.1 Distribution of comment categories by language group
  • F.2 Distribution of sub-categories of content by language group
  • F.3 Distribution of sub-categories of expression by language group
  • F.4 Distribution of sub-categories of research by language group
  • F.5 Distribution of sub-categories of structure by language group
  • F.6 Distribution of sub-categories of presentation by language group
  • Appendix G: All essay marks


This study is a revised version of my PhD thesis submitted to the University of Melbourne in 2010. My interest in rater behaviour and writing assessment arose through my experience of providing essay writing assistance to tertiary students with different language backgrounds and varied levels of academic English proficiency. I wondered how, when later submitted for assessment, these essays were judged by tutors and lecturers: what assessment criteria were used, and how did these shape the assessors’ understandings of quality? Such questions eventually led me to the enquiry that forms the basis of this book.

For her supervision of my PhD research, I would like to thank Prof. Gillian Wigglesworth, who patiently guided me through the project and kept me on track, and whose insights helped me to shape the final work. I would also like to acknowledge the support of Assoc. Prof. Cathie Elder who has advised me over many years and been instrumental in my development as a researcher. I am also grateful to past and present colleagues and friends in the School of Languages and Linguistics and the Language Testing Research Centre at the University of Melbourne for providing me with opportunities, assistance and encouragement including Neomy Storch, Janne Morton, Celia Thompson, Kathryn Hill, Noriko Iwashita, Anne Isaac, Lis Grove, Annie Brown, Paul Gruba, Carsten Roever and Tim McNamara.

Finally I would like to thank several people (anonymously) at the research site for their various contributions without which the study would not have been possible: the students and academics who participated in the study, and the Head of Department, course leaders and administrative staff who facilitated the data collection.

Sally O’Hagan

Melbourne, Australia

August 2014

← 9 | 10 → ← 10 | 11 →


This book is concerned with the quality of assessment practices in higher education. The book’s starting point is an investigation of student language background – whether native- or non native-speaker of English – as a source of variability in the responses of university assessors to undergraduate disciplinary writing. Variability amongst assessors is a concern for assessment quality because it may represent a fundamental threat to the soundness of judgements made about student performance. Specifically, the book investigates variability in the marks awarded to essays and the features of writing performance that assessors focus on in making these judgements about quality.

Concerns about the quality of assessment practices in higher education are by no means new. Increasing student numbers however, including a growing proportion of international students, has provided a changing context for these concerns. Given the diversity of students now participating in higher education, the issue of consistency of assessment criteria and standards across different groups of students has become a focus in considerations of quality and fairness in student assessment.

The first section of this chapter introduces the assessment issues and research problem considered in this book, contextualises these and in doing so, explains the rationale for the research with particular reference to the higher education environment in Australia. In one sense, the book can be read as a case study of assessment practices which throws to light how locally constituted cultures of practice can threaten the validity of student assessments, and in another, as an empirically-based contribution to a broader conversation amongst researchers, practitioners and policy makers around the world who are critically engaged with the issues of assessment quality dealt with here. ← 11 | 12 →

1.1Background to the research

The critical role of assessment in the facilitation of student learning has been widely acknowledged in the learning and teaching literature of the last two decades (Brown and Knight, 1994; Ecclestone and Swann, 1999; Ramsden, 1992). Assessment is even claimed to be ‘probably the single biggest influence on how students approach their learning’ (Rust, O’Donovan and Price, 2005: 232). Yet despite this appreciation of its importance, in higher education (HE), the quality of assessment has attracted widespread criticism. Concerns over whether assessment in HE is valid, reliable, and fair have been persistent in the scholarly literature (for example, Ecclestone, 2001; Fleming, 1999; Knight, 2002a; Yorke, 2008; 2011), while in the words of one critic: ‘Assessment is possibly one of the least sophisticated aspects of university teaching and learning’ (James, 2003: 197). Issues of declining academic standards and inconsistencies in marking practices in HE have also emerged as recurring themes in public debate. For example, in Australia, in an issues paper arising from a 2002 federal government review of HE (DEST, 2002a), it was noted that universities faced ‘serious accusations’ of ‘dumbing down’ and ‘soft marking’ (para.72), and that ‘the variability in assessment approaches has the potential to cause domestic and international students considerable confusion (para.152). In examples from the media, news headlines have pointed to allegations of managerial pressure to pass under-performing performing students,1 and in current affairs television, attention given to the issue of standards has suggested that universities are lowering standards for international students with inadequate English language skills.2 In the UK, ← 12 | 13 → university assessment has been accused of being a ‘lottery’,3 while in the United States, concerns over ‘grade inflation’ have been aired on the dedicated website, gradeinflation.com.4

While such claims about inconsistencies and other quality issues in assessment are difficult to substantiate because they are so often anecdotal (and in the case of declining standards, as is argued by Yorke (2008), the data that would be required for a valid comparison of standards over time is, in fact, not available), concerns nevertheless proliferate within a context of a lack of confidence in HE assessment. The essay assignment, increasingly emphasised in undergraduate education (Robson, Francis and Read, 2002), is often the focus of attention in this regard. This is not least because performance on essay tasks has become the most common measure of academic success (Kusel, 1992), having overtaken alternatives such as end of course examinations (Hounsell, 1997), but also because of the inherent difficulty of obtaining reliable assessments of complex, qualitative performances like the student essay (Bloxham, 2009; Knight and Yorke, 2003; Yorke, 2008; 2011). As Knight (2002b) argues, the less defined and/or more complex the task, the greater the levels of subjectivity and uncertainty in assessment. What is more, the essay assignment is typically a high stakes task for students, worth a substantial proportion of their total marks for a given course. Unfortunately, the lack of research on assessment of the university essay in particular (Robson et al., 2002; Smith and Coombe, 2006), coupled with an absence of discussion around the actual practices of university assessment (Bloxham, 2009; James, 2003) – or indeed, arguably, a failure to relate theory to practice (Joughin, 2008) – does little to dispel the concerns of critics who believe, as Knight (2002a) has suggested, that practices in high stakes undergraduate assessment are in ‘disarray’.

The need to address praxis concerns about assessment in HE is compelled by several factors that are challenging universities to scrutinise their current approaches and to consider the need for assessment reform. One of these factors is the widening of participation in HE and the inevitable ← 13 | 14 → growth in student diversity that accompanies this (Macdonald and Stratta, 2001; Scott, 2005). In Australia, international education has experienced rapid growth concentrated in the HE sector where there was a fourfold increase in the number of international students in the ten years from 1995 to 2005 (Australian Bureau of Statistics, 2007). Due to this rapid expansion in international education in coincidence with a period of little to no growth in the numbers of domestic students (Marginson, 2007), international students have now come to account for one quarter of all HE students in Australia (Department of Education, 2014). The majority of these students (81%) are from the Asian region, and are from predominantly non-English speaking backgrounds, with China, Singapore, Malaysia, HK and India being the top source countries (Department of Education, 2014). International students thus contribute greatly to the diversity in student culture, language, and educational background found on Australian campuses today. Similar trends can be observed in the USA, where a 32% increase in international students has been observed over the last ten years (Institute of International Education, 2012), and in the UK, where more than 40% of postgraduates in the year 2012-13 were international students (UKCISA, 2014). Also contributing to this student diversity is the range of language and cultural backgrounds amongst domestic students, or the diversity within society (Borland and Pearce, 2002). In 2012 for example, for 15% of domestic students at university, the predominant language used at home was a language other than English and 22% of domestic students were born overseas (Department of Education, 2012).

With this diversity in mind, this research reported in this book undertakes to address the lack of research attention to assessment quality in the realm of university essay marking by investigating micro-level assessment practices in an undergraduate subject at a large Australian university. With a view to researching variability in university assessors’ responses to essays, a study was carried out in which the same group of essays were marked by several different assessors. The study considered the impact of student language background by comparing assessors’ responses to essays by native-speakers (NS) and nonnative-speakers (NNS) of English.

In so far as specialised services in English language and academic skills development are typically provided for international students in an effort to provide students with the best opportunity for academic success, one might say that Australian universities are adequately responsive to the ← 14 | 15 → needs of a linguistically diverse student population. Since student performance on the coursework essay is a common measure of academic success, the teaching of writing skills to international students typically aims to improve not only writing quality, but also students’ understandings of what assessors are looking for when they mark these essays. However, while the teaching of academic writing is supported by a vast body of pedagogically motivated research on what assessors report about their perceptions of student writing problems, and about the writing skills they value, there has been very little complementary research on assessment in practice (Smith and Coombe, 2006). Consequently, little is known about how university assessors actually respond to student writing in the context of disciplinary assessment, such as where the coursework essay is used, including if, and how, assessors may differ in their responses to NS essays compared with NNS essays. By implication, little is known about whether NS and NNS students are receiving fair treatment at university.

Certainly, differences between NS and NNS writing in English have been well documented in the research literature on second language writing, and the impact of these differences on readers explored to an extent. In a review by Silva (2001), of a large number of empirical studies, although he observes that there is a great amount of similarity between first and second language (L1 and L2) writing, he concludes overall that L1 and L2 writing differs in ‘numerous and important ways’ (p. 671), citing a range of salient differences in text features (including fluency, accuracy and structure). In non-disciplinary contexts (such as English composition and language testing), research on the rating of writing has shown that the distinctive features of NNS writing can be implicated in negative judgements of writing quality (for example, Hamp-Lyons and Zhang, 2001; Scarcella, 1984). Although it might be reasonable to hypothesise that the same could be occurring in the assessment of disciplinary writing at university, research findings on the comparative treatment of NS and NNS writing are inconclusive. A number of studies (including Campbell, 1990; Carlson, 1988; and Park, 1988 for instance) have shown that NNS texts are more likely to be considered poorer quality than NS texts (by NS judges, at least), but it is also commonly reported that readers may be more lenient in their judgements if they perceive the writer to be a NNS (for example, Chase and Wakefield, 1984; Haswell, 1998a; Janopoulos, 1992; Jenkins, Jordan and Weiland, 1993). ← 15 | 16 →

Therefore, in the context of HE disciplinary study, while student language background might be expected to have some bearing on the assessment of written assignments, it is unclear whether and/or how students might be advantaged or disadvantaged by their language background. Moreover, it is difficult to contemplate questions about the treatment of NS and NNS essays, and the fundamental issue of fairness for these groups of students, without first having a greater than present knowledge about assessment practices in HE.

There are several reasons for inquiring into assessment practices in general, and the possibility of differential treatment of NS and NNS essays in particular. One of these is the entitlement of all students to assessments that are based on sound judgements; that is, marks should provide valid and accurate information about academic performance (the rights of stakeholders including employers of graduates and other consumers of assessment data, being likewise involved). Indeed, some research suggests that student satisfaction with assessment hinges on perceptions of ‘fair’ treatment (Flint and Johnson, 2011). Further, where concerns about assessment relate to group differences – in this case, of language, culture and educational background – student rights to equitable assessment are at stake.

In addition to the growth in student diversity referred to above, there are a number of other features of the current Australian HE context that make research on assessment practice particularly timely. The first of these is the changes to the funding of HE that have left universities increasingly reliant on international fee revenue (Marginson, 2007; Long, 2010). The reason this is important for assessment is that some areas of concern about assessment quality have been linked to fee-paying students in general, and international students in particular. In fact, James (2003) traces a path between government concern for ensuring the quality of academic standards – which, as argued in the government discussion paper on HE, Striving for Quality (DEST, 2002a: para.142), are ultimately determined by ‘the assessment practices of individual academics’ – and the reliance of the HE sector on international fee revenue. With respect to similar circumstances in the UK, Ecclestone (1999) has also drawn a link between academic standards and a market approach to education, suggesting that financial pressures may influence assessment decisions, while Hawe (2003) in relation to HE in New Zealand, argues that the need to maintain student numbers may be implicated in the reluctance of some ← 16 | 17 → lecturers to fail students. The influence of the reduction in public funding of universities may also be manifest, as Yorke (2008) suggests, in the presence of a relatively new kind of stakeholder in the assessment process – the student ‘customer’ who, in paying for their education, might be anticipated to make ‘enhanced demands for value for money’ (Yorke, 2008: 197).

While the reality or extent of managerial and/or consumer pressure to pass fee-paying students is uncertain, there is however, a growing perception that larger course enrolments as a result of widening participation (Mulryan-Kyne, 2010) have stimulated moves towards greater formalisation and transparency in assessment (Ecclestone, 2001). One of the reasons for this is to facilitate consistent communication around assessment to students in different lecture and tutorial groups. Another motivation relates to the greater utilisation of sessional staff in universities, which is a response to the demands of teaching and assessing larger numbers of students, at the same time as it is a reaction to financial pressures faced by universities. The increased reliance on sessional staff, implications of which have been discussed by a number of authors (for example, Smith and Coombe, 2006), creates particular problems for assessment because of the difficulties associated with the sharing of knowledge and expectations where teaching teams consist of loosely bound communities (Price, 2005).


ISBN (Softcover)
Publication date
2014 (October)
fairness empirical study subjective nature
Bern, Berlin, Bruxelles, Frankfurt am Main, New York, Oxford, Wien, 2014. 321 pp.

Biographical notes

Sally Roisin O'Hagan (Author)

Sally O’Hagan is a Research Fellow in the Language Testing Research Centre, School of Languages and Linguistics at the University of Melbourne where she also teaches in the academic English program. Her research interests include assessment literacy, specific purpose language testing, and assessing young language learners. She is co-editor of Papers in Language Testing and Assessment.


Title: Variability in assessor responses to undergraduate essays
book preview page numper 1
book preview page numper 2
book preview page numper 3
book preview page numper 4
book preview page numper 5
book preview page numper 6
book preview page numper 7
book preview page numper 8
book preview page numper 9
book preview page numper 10
book preview page numper 11
book preview page numper 12
book preview page numper 13
book preview page numper 14
book preview page numper 15
book preview page numper 16
book preview page numper 17
book preview page numper 18
book preview page numper 19
book preview page numper 20
book preview page numper 21
book preview page numper 22
book preview page numper 23
book preview page numper 24
book preview page numper 25
book preview page numper 26
book preview page numper 27
book preview page numper 28
book preview page numper 29
book preview page numper 30
book preview page numper 31
book preview page numper 32
book preview page numper 33
book preview page numper 34
book preview page numper 35
book preview page numper 36
book preview page numper 37
book preview page numper 38
book preview page numper 39
book preview page numper 40
323 pages