Corpus-based Approaches to Translation and Interpreting

From Theory to Applications

by Gloria Corpas Pastor (Volume editor) Miriam Seghiri (Volume editor)
©2016 Edited Collection 296 Pages


Corpus-based translation studies have come a long way since they were introduced in the last decade of the 20th century. This volume offers a balanced collection of theoretical and application-orientated contributions which establish novel trends in the area of corpus-based translation and interpreting studies. Most of the theoretical contributions report on studies related to translation universals such as simplification, explicitation, normalisation, convergence or transfer. The application-orientated contributions cover areas as diverse as corpus-based applied research, training, practice and the use of computer-assisted translation tools.

Table Of Contents

  • Cover
  • Title
  • Copyright
  • About the author(s)/editor(s)
  • About the book
  • This eBook can be cited
  • Contents
  • Foreword
  • Preface
  • Descriptive Translation and Interpreting Studies (DTIS)
  • A corpus study of loans in translated and non-translated texts
  • How terminological equivalence differs from translation equivalence: Quantitative and qualitative comparisons of term variants and their translations in a parallel corpus of EU texts
  • Do translations simplify the language of the original? Some evidence from translated migrant literature
  • Explicitness of specialised terminology in popular science: an English into Spanish corpus-based study
  • Well, interpreters… a corpus-based study of a pragmatic particle used by simultaneous interpreters
  • Intermodal corpora: A novel resource for descriptive and applied translation studies
  • Applied Translation and Interpreting Studies (ATIS)
  • Corpus-based translanguaging for translation education
  • Developing trainee translators’ instrumental subcompetence using query tools and corpora
  • Corpus analysis and the translation of adverbs in specialised texts: Raising student awareness
  • How corpora can help the interpreter walk the tightrope
  • Trainees’ perspective on the use of corpora in Translation and Interpreting: a trip into the unknown
  • Corpora in computer-assisted translation: a users’ view
  • The benefit of comparable corpora: automatic translation of multiword expressions without translation resources

| 7 →


Corpus-based translation studies have come a long way since they were first introduced in the 1990s, as the contents of this book aptly testify. As a whole the volume clearly illustrates the extent to which the use of language corpora in translation and interpreting has established itself both as a research area and as a methodological approach. As a research area, it concerns the tools, resources and methods which can be developed, adapted and put to test in order to investigate questions regarding translation processes and products. As a methodological approach, it concerns how corpus linguistics techniques and procedures can be applied to translation and interpreting research, education and practice.

The essays contained in the volume advance debate on the contribution of corpus linguistics to translation studies by moving forward key education and research issues and promoting fresh approaches through a diversity of methods and instruments. The data on which the contributions draw come from languages such as Dutch, French, Portuguese, English, German, Spanish and Italian, and cover quite a few different text types, spanning from institutional documents produced by the European Union (EU) to migrant literature, from popular science to specialised medical texts, to name just a few. Corpus resources range from the very large to the very small, from well-established online corpora to local corpora created ad-hoc for specific investigations or training applications, from carefully constructed and annotated resources to quickly assembled and disposable sets of texts taken from the Web. The typology of corpora, monolingual as well as bilingual, is also extremely varied. Some corpus types, like comparable corpora and parallel corpora, are already quite familiar to translation studies researchers. These types of language resources allow, for instance, to compare the distribution of lexical loans in translated and non translated texts (monolingual comparable corpora), to investigate interlingual terminological variation (bilingual parallel corpora), and to identify and extract terminological units across languages (bilingual comparable corpora). Other resources such as intermodal corpora, which comprise translated written texts and transcriptions of interpreted oral production from the same sources, and monolingual speech corpora, in which audio recordings are aligned with the corresponding transcriptions, are more experimental or newly used in the field, opening up innovative avenues of research. The assortment of software employed for processing the data is quite diversified, too: some of the tools are specifically designed to assist translators in a Computer-Assisted Translation (CAT) scenario; others are general-purpose corpus applications whose ← 7 | 8 → capabilities are specifically exploited for translation research and practice; others still are common text processing or query applications whose use is informed by the needs and requirements of learners and investigators.

The linguistics features considered and the methods used to carry out the analyses exemplify the scope and potential of applying corpus methodologies to the study and practice of translation and interpreting. Features investigated vary from pragmatic markers and metalinguistic operators to collocations and phraseological units. The methodological approaches implemented, though relying on different degrees of automation, all go to show how, on the one hand, the systematic examination of linguistic evidence would not be possible without recourse to corpora and, on the other, how results always need to be interpreted and evaluated by the researcher.

While discussing both descriptive and practical aspects of translation and interpreting, special attention is dedicated throughout the volume to the application of research to a variety of training environments within translator and interpreter educational contexts. These include the teaching of written translation and of simultaneous interpreting, instruction in the use of general and specialised translation technology tools, and translanguaging classes involving the planned and systematic use of two languages in teaching and learning situations. Notably, the volume is also enriched by two essays which expound the results of surveys conducted among learners and professionals regarding the extent and impact of corpus resources and related technologies in translation practice.

Initially motivated by the possibility to provide an empirical basis to theoretical research, investigations using digital data have expanded to include descriptive studies of interpreted as well as of translated texts, and research concerning the application of corpus techniques and methodologies to translator and interpreter education and professional use. This volume well represents the breadth and width of the growing body of literature in the field of corpus-based translation and interpreting studies. The editors have done a remarkable job by assembling the insights of a wide range of distinguished practitioners and bringing out a publication which not only builds on previous research, but tackles novel aspects and issues, addressing on the one hand the concerns of scholars and researchers, on the other those of trainee and professional translators and interpreters.

Federico Zanettin

University of Perugia (Italy)

| 9 →


The translators and interpreters’ workplace has changed dramatically over the last two decades. Nowadays computerised tools prove to be essential for these professions. In this context, corpora play an important role in accessing information which was largely inaccessible before the actual advances in computer technology. Corpus-based translation and interpreting studies (CTIS) is recognised today as a major paradigm and research methodology that has transformed analysis within the discipline of translation and interpreting studies and outside those disciplines, too. CTIS is a very promising field with the potential to be very productive; at present, it is an under-researched and not yet fully explored topic.

This volume offers a balanced collection of theoretical and application-orientated contributions which establish new trends in corpus-based translation and interpreting studies. On the one hand, the theoretical proposals, based on descriptive studies, bear testimony to the conceptual development of the discipline, and most of the studies focus on universals such as simplification, explicitation, normalisation, convergence or transfer. On the other hand, the applied studies tackle new challenges in the discipline and fall into different areas of corpus-based applied research such as training, practice and the use of computer-assisted translation (CAT) tools. The selected contributions in this volume therefore fall into two main sections: Descriptive Translation and Interpreting Studies (DTIS) and Applied Translation and Interpreting Studies (ATIS).

The section on Descriptive Translation and Interpreting Studies (DTIS) opens with a chapter written by Ana Frankenberg-Garcia entitled “A corpus study of loans in translated and non-translated texts,” in which the author, using the COMPARA corpus—a bidirectional parallel corpus of Portuguese and English fiction texts containing three million words—, compares the use of loan words in translated and non-translated fiction texts. Frankenberg-Garcia also investigates the shifts that occur from source to target text in relation to the use of loans; the analysis focuses on the frequency and on the language distribution of loans. The relevant results of the study show that there tend to be more loans in translations than in the original texts, that loans from the translation language tend to be eliminated, that the choice of loan languages used is related to the relative status of the source language and culture, and that translators sometimes use loans from other languages to compensate for the loss of loans from the target language or to bridge the source and the target cultures. ← 9 | 10 →

In the contribution “How terminological equivalence differs from translation equivalence: Quantitative and qualitative comparisons of term variants and their translations in a parallel corpus of EU texts,” Koen Kerremans and Rita Temmerman present a corpus-based approach applied to quantitative and qualitative comparisons of English environmental term variants and their translations into Dutch and French in EU parallel texts in order to search for patterns. At the same time, the authors examine, on the basis of corpus-based research, how EU translators tend to deal with lexical variation in texts that require different language versions. With respect to the quantitative part of the analysis, Kerremans and Temmerman study the quantitative differences and similarities that can be observed between English terms and their translations and to what extent quantitative patterns of terminological variation in English source texts are also reflected in the Dutch and French target texts. In the qualitative part of the analysis, the authors present patterns of interlingual variation within the corpus, i.e., the different ways in which each term in the source texts is translated into the target languages. They examine how interlingual variation in the translation of specific terms is achieved and focus specifically on deviations from literal translations of terms. Thus, they conclude that deviation from literal translations may occur and is contextually conditioned, according to the results provided by the corpus.

Philippe Humblé’s chapter “Do translations simplify the language of the original? Some evidence from translated migrant literature” revolves around three novelists who do not write in their mother language: Yoko Tawada, from Japan, Emine Sevgi Özdamar, born in Turkey, and Kader Abdolah, from Iran. Their novels Das nackte Auge (by Tawada), Das Leben ist ein Karawanserei (by Özdamar) and Spijkerschrift (by Abdolah) have been also translated into different European languages. According to Humblé, the three novelists seem to want to distinguish themselves from native authors by using a style that appears lexically plainer. This lexical simplification is analysed by Humblé with the help of Wordsmith 6.0 and leads to the conclusion that the lexical density of the translated novels is lower than that of the original novels, even when the original is already simple, i.e. there seems to be a tendency to simplify the language used in translated texts. According to Humblé, this fact validates the Translational Universals theory and allows him to link it to Berman’s Retranslation hypothesis and Toury’s Descriptive Translation Studies.

Clara Inés López-Rodríguez, in her contribution entitled “Explicitness of Specialised Terminology in Popular Science: an English into Spanish Corpus-based Study,” explores the use of explicit lexical metalinguistic operators (ELMOs). ELMOs are lexico-grammatical resources used in specialised texts —in this case ← 10 | 11 → science— to bridge the knowledge gap between experts and non-experts by explaining or defining specialised concepts and terminology, constituting a metalanguage of a specific discipline. López-Rodríguez presents a top-down approach reviewing different studies on the translation of popular science, relating them to the notion of explicitation and explicitness, and proposing a taxonomy of lexico-grammar and paralinguistic resources used in science dissemination. This top-down approach is complemented by a bottom-up approach based on a corpus of popular science in English and Spanish —analysed with Sketch Engine— which proves that ELMOs are used differently in English popular science texts in relation to their translations into Spanish.

Regarding Descriptive Studies focused on interpreting, Bart Defrancq, in the chapter “Well, interpreters… a corpus-based study of a pragmatic particle used by simultaneous interpreters,” investigates the use of the English pragmatic marker well by interpreters and, in particular, seeks to determine what motivates the interpreter to use it. In order to achieve this, the author has created a corpus drawn from several sub-corpora, mainly the European Parliament Interpreting Corpus (EPIC) and the European Parliament Interpreting Corpus–Ghent (EPICG). EPIC was used to extract original English texts and English target texts interpreted from Spanish and Italian sources while EPICG was used to get English target texts interpreted from French. The results of the analysis proved that interpreters do not underuse the marker well and that the use of the marker is rarely triggered by the presence of a marker in the source text.

This section ends with a descriptive study on both translation and interpreting entitled “Intermodal corpora: A novel resource for descriptive and applied translation studies” written by Silvia Bernardini. The author introduces intermodal corpora, their potential and their challenges in order to describe the European Parliament Translation and Interpreting Corpus (EPTIC), a resource that makes available for comparison samples of texts translated in different modes (in the case of EPTIC, in the written and spoken —simultaneous interpreting— modes). The aim of this contribution is to extract comparable sets of collocation candidates from each pair of intermodal subcorpora in two language directions (English-Italian and Italian-English). In order to decide whether a word sequence used by a translator or interpreter is a collocation, Bernardini checks whether it is used more than expected in the general language, using two lexical association measures: Mutual Information and T-Score. Benardini states that the examples provided mainly illustrate cases of register shifts (increases or decreases of formality), meaning shifts (contraction, expansion, clarification, broadening or transformation) and normalisation. ← 11 | 12 →


ISBN (Hardcover)
Publication date
2016 (September)
Frankfurt am Main, Bern, Bruxelles, New York, Oxford, Warszawa, Wien, 2016. 296, 42 s/w Abb, 43 Tab.

Biographical notes

Gloria Corpas Pastor (Volume editor) Miriam Seghiri (Volume editor)

Gloria Corpas Pastor is Professor in Translation and Interpreting (University of Malaga, Spain) and Visiting Professor in Translation Technologies at RIILP (University of Wolverhampton, UK). Miriam Seghiri is Senior Lecturer in Translation and Interpreting also at the University of Malaga. Their research fields include specialised translation, corpus linguistics and translation technology.


Title: Corpus-based Approaches to Translation and Interpreting