Loading...

Cross-linguistic Mediated Communication: Hybrid Text Production English-Spanish

by Rosa Rabadán (Volume editor) Noelia Ramón (Volume editor)
©2025 Edited Collection XII, 268 Pages
Series: Linguistic Insights, Volume 316

Summary

Hybrid text production today comprises machine translation chunks, human-produced paragraphs modelled on the writer’s L1, and generative AI contributions via prompting. This book explores how corpus-based cross-linguistic studies can contribute to building and optimising Natural Language Generation in multilingual contexts. The aim is to show how rich linguistic annotation can provide valuable data to enhance and refine hybrid text production. Corpus data are mined from CLANES, a multilayer English-Spanish comparable corpus annotated for PoS, semantic, rhetorical and pragmatic information. The study highlights the implications of leveraging LLMs for the automatic generation of domain-specific texts. It also addresses the new challenges and opportunities raised by AI-enhanced data augmentation and post-editing, stressing the need for human control over text generation.

Table Of Contents

  • Cover
  • Title Page
  • Copyright Page
  • Table of Contents
  • List of Figures
  • List of Tables
  • Introduction: Hybrid text production across languages in the era of generative AI (Rosa Rabadán and Noelia Ramón)
  • CLANES: A multilayer English-Spanish comparable corpus (Hugo Sanjurjo-González)
  • 1. Introduction
  • 2. Related work
  • 3. ACM as corpus analysis software
  • 4. The building of the CLANES corpus
  • 5. Querying the CLANES corpus
  • 6. Conclusions and further work
  • Semantic processing and description of multiword expressions in CLANES (Leticia Moreno-Pérez and Belén López-Arroyo)
  • 1. Introduction
  • 2. Approaches to the management of MWEs
  • 3. Method: processing MWEs in CLANES
  • 4. Results
  • 5. Data applications
  • 6. Conclusions
  • Contrasting pragmatic functions in CLANES: <Recommend> and <Suggest> (María Pérez Blanco and Marlén Izquierdo)
  • 1. Introduction
  • 2. CLANES: a domain-specific corpus for pragmatics research
  • 3. Corpus Pragmatics and Speech Acts
  • 4. Case study
  • 5. Results
  • 6. Discussion
  • 7. Conclusion
  • Unpacking the rhetorical structure of online food and drink descriptions (Isabel Pizarro Sánchez and María Teresa Ortego Antón)
  • 1. Introduction
  • 2. Move analysis and rhetorical annotation in academic and professional genres
  • 3. Materials and methods
  • 4. Rhetorical gold standard
  • 5. Conclusions
  • Human vs. ChatGPT corpus annotation: Data augmentation using LLM fine-tuning (Lucía Sanz-Valdivieso and Belén López-Arroyo)
  • 1. Introduction
  • 2. Natural Language Processing and Text Production Tools
  • 3. Methodology
  • 4. Results and Discussion
  • 5. Conclusion
  • Technology-assisted professional writing for Spanish-speaking users (Belén Labrador and Noelia Ramón)
  • 1. Introduction
  • 2. Working procedure
  • 3. From comparable corpora to text generators
  • 4. Results: ACTRES text-generators
  • 5. Conclusions
  • Post-editing grammatical interference (Rosa Rabadán, Ana García-Gallego and Hugo Sanjurjo-González)
  • 1. Introduction
  • 2. Post-editing
  • 3. Interference
  • 4. Method and data
  • 5. Results and applicability
  • 6. Conclusions
  • Enhancing template-based text generation using Large Language Models (Hugo Sanjurjo-González, Rosa Rabadán and Noelia Ramón)
  • 1. Introduction
  • 2. A brief introduction to text generation technologies
  • 3. Text Generation in ACTRES
  • 4. Proposed enhancements with LLMs
  • 5. Conclusions and further work
  • References
  • Index
  • Notes on Contributors

Rosa Rabadán / Noelia Ramón (eds.)

Cross-linguistic Mediated Communication

Hybrid Text Production English-Spanish

Lausanne · Berlin · Bruxelles · Chennai · New York · Oxford

The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available on the Internet at http://dnb.d-nb.de.

A record in the CIP catalog has been requested for this book of the Library of Congress.

Library of Congress Control Number: 2025004155

ISBN 978-3-0343-5100-3 (Print)

E-ISBN 978-3-0343-5792-0 (E-PDF)

E-ISBN 978-3-0343-5793-7 (E-PUB)

DOI 10.3726/b22833

Published by Peter Lang Group AG, Lausanne, Switzerland

Any utilization outside the strict limits of the copyright law, without the permission of the publisher, is forbidden and liable to prosecution. This applies in particular to reproductions, translations, microfilming, and storage and processing in electronic retrieval systems.

Table of Contents

List of Figures

List of Tables

ROSA RABADÁN AND NOELIA RAMÓN

Introduction: Hybrid text production across languages in the era of generative AI

HUGO SANJURJO-GONZÁLEZ

CLANES: A multilayer English-Spanish comparable corpus

LETICIA MORENO-PÉREZ AND BELÉN LÓPEZ-ARROYO

Semantic processing and description of multiword expressions in CLANES

MARÍA PÉREZ BLANCO AND MARLÉN IZQUIERDO

Contrasting pragmatic functions in CLANES: <RECOMMEND> and <SUGGEST>

ISABEL PIZARRO SÁNCHEZ AND MARÍA TERESA ORTEGO ANTÓN

Unpacking the rhetorical structure of online food and drink descriptions

LUCÍA SANZ-VALDIVIESO AND BELÉN LÓPEZ-ARROYO

Human vs. ChatGPT corpus annotation: Data augmentation using LLM fine-tuning

BELÉN LABRADOR AND NOELIA RAMÓN

Technology-assisted professional writing for Spanish-speaking users

ROSA RABADÁN, ANA GARCÍA-GALLEGO AND HUGO SANJURJO-GONZÁLEZ

Post-editing grammatical interference

HUGO SANJURJO-GONZÁLEZ, ROSA RABADÁN AND NOELIA RAMÓN

Enhancing template-based text generation using Large Language Models

References

Index

Notes on Contributors

Details

Pages
XII, 268
Publication Year
2025
ISBN (PDF)
9783034357920
ISBN (ePUB)
9783034357937
ISBN (Hardcover)
9783034351003
DOI
10.3726/b22833
Language
English
Publication date
2025 (July)
Keywords
Corpus Linguistics Writing aids Translation Post-editing Data augmentation Generative AI LLMs Text generators Multilayer annotation Pragmatic annotation Rhetorical annotation Fine-tuning Interference Multiword expressions Replicability
Published
Lausanne, Berlin, Bruxelles, Chennai, New York, Oxford, 2025. xii, 268 pp. 50 fig. b/w, 30 tables.
Product Safety
Peter Lang Group AG

Biographical notes

Rosa Rabadán (Volume editor) Noelia Ramón (Volume editor)

Rosa Rabadán is a Professor at the Department of Modern Languages at the University of León, Spain. Her areas of interest include corpus-based contrast English-Spanish, translation, and language technology. Noelia Ramón is an Associate Professor of English at the University of León. She is the current leader of the ACTRES research group, engaged in producing expert language data for the development of text-production applications in specialised domains.

Previous

Title: Cross-linguistic Mediated Communication: Hybrid Text Production English-Spanish