This study investigates the process of rating texts written by adult ESL learners. Four experienced raters provided think-aloud protocols describing the rating process for a set of 24 texts. The think-aloud data allowed analysis of the sequence of rating, raters’ interpretations of the scoring categories, and difficulties raters faced. The study reveals the complexity of the rating process, whereby raters struggle to resolve a tension between the wordings (or rules) of the rating scale and their complex, initial, intuitive impression of the text. Rating requires training to provide reliable measurement. The study also demonstrates that caution is needed in interpreting results from think-aloud data, despite their methodological value in this kind of study.