Computational Text Analysis and Reading Comprehension Exam Complexity

Towards Automatic Text Classification

by Trisevgeni Liontou (Author)
Monographs XVIII, 278 Pages
Series: Language Testing and Evaluation, Volume 36

Table Of Content

  • Cover
  • Title
  • Copyright
  • About the author(s)/editor(s)
  • About the book
  • This eBook can be cited
  • Abstract
  • List of Appendices
  • List of Abbreviations
  • Table of Contents
  • 1. Introduction
  • 1.1 Rationale of the study
  • 1.2 Aim of the study
  • 1.3 Usefulness of the study
  • 1.4 Book Structure
  • 2. Literature Review
  • 2.1 Introduction
  • 2.2 Readability Formulas
  • 2.3 Text structural complexity
  • 2.3.1 Text organisation
  • 2.3.2 Halliday & Hasan’s Model of Text Cohesion
  • 2.4 Lexicogrammatical complexity
  • 2.4.1 Lexical Density
  • 2.4.2 Grammatical Intricacy
  • 2.4.3 Lexical Diversity
  • 2.4.4 Propositional Idea Density
  • 2.4.5 Word Frequency
  • 2.4.6 Idioms
  • 2.4.7 Phrasal Verbs
  • 2.4.8 Additional text variables
  • 2.5 Reader Variables
  • 2.5.1 Content schemata & reading comprehension
  • 2.5.2 Formal schemata & reading comprehension
  • 2.5.3 Topic preference & reading comprehension
  • 2.5.4 Background knowledge & test bias
  • 2.5.5 Test-takers’ strategies & reading comprehension
  • 2.5.6 Sex-based differences & reading comprehension
  • 2.5.7 Additional test-takers’ characteristics & reading comprehension
  • 2.6 Concluding remarks
  • 3. Research Methodology
  • 3.1 Introduction
  • 3.2 The KPG English Reading Corpus
  • 3.3 Automated Text Analysis Tools
  • 3.3.1 Basic Text Information
  • 3.3.2 Text genre specification
  • 3.3.3 Word Frequency Indices
  • 3.3.4 Readability Indices
  • 3.3.5 Propositional Idea Density Indices
  • 3.3.6 Lexical Richness Indices
  • 3.3.7 Text Abstractness Indices
  • 3.3.8 Syntactic Complexity Indices
  • 3.3.9 Cohesion & Coherence Indices
  • 3.3.10 Referential & Semantic Indices
  • 3.3.11 Psycholinguistic Processes Indices
  • 3.3.12 Idioms & Phrasal Verbs Indices
  • 3.4 The KPG National Survey for the English Language Exams
  • 3.4.1 The sampling frame
  • The sample size
  • Sample representativeness
  • Stratified random sampling
  • 3.4.2 The KPG English Survey: Design & Application
  • Why a questionnaire?
  • Operationalizing the questionnaire
  • Types of questions
  • The rating system
  • Question wording
  • Question sequencing
  • Questionnaire layout
  • The opening section
  • Questionnaire length & language
  • Ethical issues
  • 3.4.3 Piloting the KPG English Survey Questionnaire
  • 3.4.4 Administering the KPG English Survey Questionnaire
  • 3.4.5 Processing the KPG English Survey Data
  • 3.5 Reading Comprehension Task Score Analysis
  • 3.6 Triangulation
  • 4. Computational Text Analysis: Findings
  • 4.1 Text Analysis
  • 4.1.1 Basic Text Information
  • 4.1.2 Word Frequency Analysis
  • 4.1.3 Readability Formulas Scores
  • 4.1.4 Propositional Idea Density & Lexical Richness Scores
  • 4.1.5 Text Abstractness Analysis
  • 4.1.6 Syntactic Complexity Analysis
  • 4.1.7 Reference & Cohesion Analysis
  • 4.1.8 Psycholinguistic Processes Analysis
  • 4.1.9 Additional Text Variables Analysis
  • 4.2 Automatic Text Classification Model
  • 4.3 Model Validation Procedure
  • 5. KPG Test-Takers’ Performance & Perceptions: Research Findings
  • 5.1 KPG Reading Performance & Text Features
  • 5.1.1 Reading Performance & Text Features: An Across-Levels Analysis
  • 5.1.2 B2 Reading Performance & Text Features
  • 5.1.3 C1 Reading Performance & Text Features
  • 5.1.4 Construct-validity of the KPG language exams in English
  • 5.2 KPG Test-Takers’ Perceptions
  • 5.2.1 KPG Test-Takers’ Profile
  • 5.2.2 KPG test-takers’ personal characteristics & reading difficulty
  • 5.2.3 KPG test-takers’ perceptions of the Reading Comprehension Test Paper
  • 5.2.4 KPG test-takers’ perceptions vis-à-vis text features
  • 5.2.5 KPG test-takers’ strategies vis-à-vis text features
  • 5.2.6 Additional reader variables vis-à-vis text features
  • 6. Discussion & Conclusions
  • 6.1 Usefulness of the study
  • 6.2 Research limitations
  • 6.3 Suggestions for future research & for the use of the findings
  • References
  • Appendices
  • Appendix 1 – Text Variables List
  • Basic Text Information
  • Word Frequency Indices
  • Readability Indices
  • Lexical Richness Indices
  • Text Abstractness Indices
  • Syntactic Complexity Indices
  • Cohesion & Coherence Indices
  • Referential & Semantic Indices
  • Psycholinguistic Processes
  • Additional Text Variables
  • Appendix 2 – B2 Text Analysis
  • Appendix 3 – C1 Text Analysis
  • Appendix 4 – List of the 40 new texts used during the model validation procedure
  • Appendix 5 – L.A.S.T. Text Classification Results
  • Appendix 6 – Regression Analysis: Correlation Matrix

| 1 →

1. Introduction

The main aim of the present study was to empirically investigate the effect specific text and reader variables have on the nature and product of the reading comprehension process, in the context of language testing, with specific reference to the Greek national exams in English for the State Certificate of Language Proficiency (known with their Greek acronym as KPG exams). The research project was stimulated by the urgent need for empirical evidence to supplement the intuitive selection of reading texts by expert item writers and test developers. More specifically, it involved systematic investigation to determine, on the basis of empirical data, what makes a text easy or difficult for test-takers of different levels of English language proficiency. As explained in detail in the following section and in the Literature Review chapter, research in this area is lacking. Therefore, item writers select reading texts intuitively and test development teams of even the most well-known international testing systems use human judges for validating their text selection processes. Nevertheless, in recent years, advances in the field of computational linguistics have made possible the analysis of a wide range of more in-depth text characteristics which, once submitted to complex statistical analysis, can provide a principled means for test providers to assign difficulty levels to test source texts based on a set of research validated criteria.

Although the issue of text readability is long-standing and has a venerable research tradition, its impact on foreign language assessment has garnered increased attention over the last decade. The number of studies on exam validity and reliability has also increased. Yet, most well‐established international exam systems have failed to provide sound evidence of their text selection processes (Bachman et al., 1988: 128; Chalhoub‐Deville & Turner, 2000: 528; Fulcher, 2000: 487). In fact, while reviewing the main characteristics of three respected international tests, namely the International English Language Testing System (IELTS), the Cambridge ESOL exams and the Test of English as a Foreign Language (TOEFL), Chalhoub-Deville and Turner (2000: 528–30) stressed the lack of adequate documentation regarding how the level of difficulty is determined, and which processes for text selection are applied, with a view to establishing internal consistency of their tests and equivalence across parallel test forms.

According to Chalhoub-Deville and Turner (ibid: 528–9) making such information available to the public is mandatory, in order to help all interested parties make informed evaluations of the quality of the tests and their ensuing scores. In addition, it is important for exam systems not only to provide information about ← 1 | 2 → the psychometric properties of their exams, but also to report, on a regular basis, descriptive information about their test-takers personal attributes -attributes such as language background, country of residence, academic level, age, sex, etc., since such information can be used to complement cut-off scores considered for admission or employment purposes (ibid: 529). Chalhoub-Deville and Turner concluded by pointing to the fact that especially nowadays, that language ability scores are used to make critical decisions that can affect test-takers’ lives, developers of large-scale tests have the responsibility not only to construct instruments that meet professional standards, but also to continue to investigate the properties of their instruments over the years and make test manuals, user guides and research documents available to the public to ensure appropriate interpretation of test scores (ibid: 528–9; Khalifa & Weir, 2009: 17).

1.1 Rationale of the study

The current study is closely linked to earlier findings of research on reading assessment, according to which many variables such as text topic and structure, text language, background knowledge and task type, can have an impact on either the reading process or product and need to be taken into account during test design and validation (Oakland & Lane, 2004: 247). As pointed out by Alderson (2000: 81) “If the reading process or product varies according to such influences, and if such influences occur in the test or assessment procedures, then this is a risk to the validity of test scores, to the generalizability of results or to the interpretation of performances”. In fact, although a lot of research has been conducted in the field of second language acquisition with specific reference to ways of reading and text processing strategies, Alderson (2000: 104) stressed language testers’ lack of success “to clearly define what sort of text a learner of a given level of language ability might be expected to be able to read or define text difficulty in terms of what level of language ability a reader must have in order to understand a particular text”. Such information would be particularly useful in providing empirical justification for the kinds of reading texts test-takers sitting for various language exams are expected to process, which to date have been arrived at mainly intuitively by various exam systems (Alderson, 2000: 104; Allen et al., 1988: 164; Fulcher, 1997: 497; Lee & Musumeci, 1988: 173; Oakland & Lane, 2004: 243).

Fulcher (1997: 497) is another testing scholar who drew on the importance of text difficulty or text accessibility as a crucial but much neglected area in language testing. For him, defining text difficulty is critical for test developers to become aware of the range of factors that make texts more or less accessible, in order to be able to select reading texts at appropriate levels for inclusion into the reading test ← 2 | 3 → papers of their examinations (ibid: 497). He further stressed that research in this area is particularly pertinent as text difficulty is re-emerging as an area of great concern not only in language teaching and materials writing but also in the testing community (ibid: 497). Echoing Fulcher, Freedle and Kostin (1999: 3) postulated that, ideally, a language comprehension test ‘should’ assess primarily the difficulty of the text itself; the item should only be an incidental device for assessing text difficulty. Much earlier, Carrell (1987a: 21) emphasized the need for EFL reading teachers and materials developers to establish reliable ways of matching the difficulty of reading materials to foreign language readers, since, if materials are too easy, students are unchallenged and bored, and no learning occurs. On the other hand, if materials are too difficult, students become frustrated and withdrawn, and again no learning occurs (ibid: 21). By extending Carrel’s view, one might assert that not only optimal learning but also optimal exam performance occurs, when the difficulty level of testing materials is appropriately matched to readers’ capabilities. However, the problem still lies in how to achieve this ideal.

Especially in relation to reading tests, it has been shown that text variables can have a significant effect on both test item difficulty and test scores, regardless of the employed test method, since the reading process involves two entities: the text and the reader (c.f. Alderson, 2000: 61; Carr, 2006: 271; Frazier, 1988: 194; Freedle & Kostin, 1999: 5; Davies & Irvine, 1996: 173; Chall & Dale, 1995: 5; Kemper, 1988: 141; Kozminsky & Kozminsky, 2001: 187; Leong et al., 2002: 126; Phakiti, 2003a: 649–650). This means that testers should try to choose texts of an appropriate readability level for the intended audience. The effect of text content, structure and vocabulary is such that “test designers should be aware that a variation in texts might be expected to lead to different test results […] Good tests of reading and good assessment procedures in general will ensure that readers have been assessed on their ability to understand a variety of texts in a range of different topics” (Alderson, 2000: 83). Brindley and Slatyer (2002: 382) further highlighted that the rather simplistic notion of difficulty reflected in item difficulty statistics is of limited usefulness in understanding what happens when an individual test-taker interacts with an individual item. If, as Buck (1994: 164) suggested “performance on each test item by each test-taker is a unique cognitive event”, then task design will require not only a much more detailed specification of text characteristics and conditions, but it will also need to be based on a much better understanding of the interaction between the text, task and reader variables (Brindley & Slatyer, 2002: 388).

Much earlier, Johnston (1984: 223) emphasized that the assumption that an item is independent of its context is unreasonable. The context clearly influences the difficulty of items based on the same text. More recently, Bachman once again ← 3 | 4 → stressed (posting to L-TEST, 19 February 2000) that “when we design a test, we specify the task characteristics and even try to describe the characteristics of the test-takers, but getting at the interaction is the difficult part”. To this end, a good deal of work will need to be devoted into building models of test performance that incorporate a wide range of overlapping difficulty components and exploring their effects on test scores (Brindley & Slatyer, 2002: 391).

In fact, although the research literature is full of evidence that text difficulty is one of the most important factors in reading comprehension, many researchers are still resorting to readability formulas or their own experience for assigning reading levels to texts (Oakland & Lane, 2004: 244; Shokrpour, 2004: 5). However, as explained in detail in the following chapter, readability formulas have been widely criticized by both L1 and L2 language researchers for limiting their scope of research on rather basic text features, such as word and sentence length, and failing to take into account a number of additional factors that contribute to text difficulty, such as syntactic complexity and density of information (Anderson, 1983: 287; Bailin & Grafstein, 2001: 292; Carr, 2006: 282; Carver, 1976: 662; Crossley et al., 2008a: 476; Farr et al., 1990: 210; Freebody & Fulcher, 1997: 501; Lee & Musumeci, 1988: 173, Meyer & Rice, 1984: 320; Prins & Ulijn, 1998: 140–1; Spadorcia, 2005: 37; Wallace, 1992: 77).

Apart from text variables, recent research in reading comprehension processes, when dealing with a foreign language from a psycholinguistics perspective, has highlighted the significant effect of reader factors on comprehension and further supported the view that a satisfactory understanding of the reading process, which involves operations at a number of different levels of processing, i.e. lexical, syntactic, semantic, and discoursal, depends not only on an accurate identification of the various text elements and the connections among them, but also on that of readers’ prior knowledge of and interest in the topic as well as the strategies used to actively reconstruct text meaning (Bachman, 2000: 11; Bailin & Grafstein, 2001: 292; Brantmeier, 2005: 37; Crossley et al., 2008a: 477; Drucker, 2003: 25; Freebody & Anderson, 1983: 278; Keshavarz et al., 2007: 20; Khalifa & Weir, 2009: 19–20; Krekeler, 2006: 121; Kozminsky & Kozminsky, 2001: 187; Langer, 1984: 469; Parker et al., 2001: 308; Phakiti, 2003a: 651; Rayner et al., 2012: 246–7; Rupp et al., 2006: 445).

Finally, in recent years, a limited number of researchers in the field of language testing have been concerned with the identification of learners’ individual characteristics that may influence performance on language tests (Phakiti, 2003b: 26; Sunderland, 1993: 47; Wightman, 1998: 255). Thus, apart from the differences across individuals in their language ability, processing strategies and schemata activation, test-takers’ personal characteristics such as sex, age and years of ← 4 | 5 → instruction in a foreign language, as well as psychological factors such as feelings of anxiety under testing conditions, have received increased attention (Pomplun & Omar, 2001: 171; Sullivan, 2000: 373; Stricker et al., 2001: 205).

Despite the considerable advances that have been made in exploring and understanding the various aspects of foreign language acquisition and reading performance, the available research has, nevertheless, been rather unsuccessful in clearly defining and, most importantly, in prioritizing those text features that have a direct impact on text complexity and need to be accounted for during the text selection and item design process. As stated above, although readability formulas have been extensively applied in the field of foreign language teaching and testing, numerous researchers have pointed to their serious limitations and repeatedly stressed the need for a more in-depth analysis of text features, in order to better define what sort of text a learner of a given level of language ability should be expected to be able to process when sitting for a specific exam.

Weir (2005: 292) further acknowledged that, although the Common European Framework of Reference for Languages (CEFR) attempted to describe language proficiency through a group of scales composed of ascending level descriptors, it failed to provide specific guidance as to the topics that might be more or less suitable at any level of language ability, or define text difficulty in terms of text length, content, lexical and syntactic complexity. In fact, according to Weir, the argument that the CEFR is intended to be applicable to a wide range of different languages “offers little comfort to the test writer, who has to select texts or activities uncertain as to the lexical breadth of knowledge required at a particular level within the CEFR” (ibid: 293). Alderson et al. (2004: 11) also stressed that many of the terms in the CEFR remain undefined and argued that difficulties arise in interpreting it because “it does not contain any guidance, even at a general level, of what might be simple in terms of structures, lexis or any other linguistic level”. Therefore, according to Alderson et al., the CEFR would need to be supplemented with lists of grammatical structures and specific lexical items for each language for item writers or item bank compilers to make more use of it. Furthermore, with specific reference to text complexity, Shokrpour (2004: 15–16) emphasized that a systemic functional grammar criterion should be included in more studies concerned with the difficulty of the text, in order to add to our knowledge about the factors contributing to difficulty and the role each one has regarding any comprehensibility problems experienced by EFL readers when confronted with English texts.

At this point, it is worth mentioning that most studies pertinent to EFL comprehension processes and reading performance involved a small number of EFL test-takers taking part not in real, high-stakes exams, but in experiments ← 5 | 6 → designed to explore a limited number of reader or text variables in isolation and which had in many cases produced rather contradictory results. With particular reference to the KPG language exams in English, the need for empirical evidence regarding what features make a text easier or more difficult for Greek users of English stimulated this research project; that is, the need to identify which texts are easier or more difficult for candidates to understand and whether their topic familiarity and reading preferences affect their perceptions of text readability.

Finally, the extensive literature in the field of testing has repeatedly emphasized the need for an exam battery to trace and bear into consideration those reader variables that may be sources of measurement error and affect overall exam performance in a more obscure way. Test designers’ knowledge of the variables that can influence the reading process and product is, thus, in many respects linked to the validity of the reading tests; test designers need to focus on making their test items as relevant as possible to described levels of difficulty on an a priori basis, and further ensure that these are not biased against particular test-takers nor are they affected in an unexpected way by the readability of the text or readers’ background knowledge (Lee & Musumeci, 1988: 173). By following such an approach, they will be able to provide evidence that the methods they employ to elicit data are appropriate for the intended purposes, that the procedures used provide stable and consistent data and, consequently, that the interpretations they make of the results are justified, since they are based on a valid and reliable exam system (Douglas, 2001: 448).


This book delineates a range of linguistic features that characterise the reading texts used at the B2 (Independent User) and C1 (Proficient User) levels of the Greek State Certificate of English Language Proficiency exams in order to help define text difficulty per level of competence. In addition, it examines whether specific reader variables influence test takers’ perceptions of reading comprehension difficulty. The end product is a Text Classification Profile per level of competence and a formula for automatically estimating text difficulty and assigning levels to texts consistently and reliably in accordance with the purposes of the exam and its candidature-specific characteristics.


XVIII, 278
ISBN (Book)
Publication date
2014 (December)
Frankfurt am Main, Berlin, Bern, Bruxelles, New York, Oxford, Wien, 2015. XVIII, 278 pp.

Biographical notes

Trisevgeni Liontou (Author)

Trisevgeni Liontou holds a PhD in English Linguistics with specialization in Testing from the Faculty of English Studies at the National and Kapodistrian University of Athens (Greece). She holds a BA in English Language & Literature and an MA in Lexicography: Theory and Applications, both from the same faculty. She also holds a M.Sc. in Information Technology in Education from Reading University (UK). Her current research interests include theoretical and practical issues of reading comprehension performance, computational linguistics, online teaching practices and classroom-based assessment.