Recent Advances in Digital Humanities

Romance Language Applications

by Madalina Chitez (Volume editor) Anca Dinu (Volume editor) Liviu Dinu (Volume editor) Mihnea Dobre (Volume editor)
©2022 Edited Collection 252 Pages


This volume is addressed to a wide range of scholars interested in the use of digital tools and methods in the humanities. Readers can find examples of new instruments and workflows which attest successful applications of the digital humanities techniques to some (traditional) problems in the scholarship of several disciplines. In addition, the focus on Romance language applications, while capturing specific language processing and analysis challenges, turns this volume into a valuable reference work.

Table Of Contents

  • Cover
  • Title
  • Copyright
  • About the editors
  • About the book
  • This eBook can be cited
  • Table of Contents
  • Introduction (Anca Dinu, Mădălina Chitez, Liviu Dinu and Mihnea Dobre)
  • Section 1: Resources and Digitalisation
  • Chapter 1 Turning 1968 Memories into Usable Texts (Annamaria Goy, Cristina Re, Davide Colla and Marco Leontino)
  • Chapter 2 Phraseology in Romanian Academic Writing: Corpus Based Explorations into Field-Specific Multiword Units (Valentina Mureșan, Roxana Rogobete, Ana-Maria Bucur, Mădălina Chitez and Andreea Dincă)
  • Chapter 3 The Latin Diachronic Database: A New Digital Tool for the Study of Latin (Tommaso Spinelli)
  • Chapter 4 A Proposal for a Multilingual E-Glossary of Discourse Markers (Cecilia Mihaela Popescu and Oana-Adriana Duță)
  • Chapter 5 The Meme That Brings About the Roar. Then, Discredit. The Tismăneanu Case in Pandemic Times (Eugen Istodor)
  • Section 2: Tools and Interfaces
  • Chapter 6 Natural Language Processing for Book Recommender Systems (Haifa Alharthi and Diana Inkpen)
  • Chapter 7 Syntactic Tree Editor (Claudius-Marian Teodorescu)
  • Chapter 8 Cartesian Visual Cosmology: Ways Towards a Digital Platform (Mihnea Dobre, Ovidiu Babeș and Ioana Bujor)
  • Chapter 9 An Evaluation of the Ithaca Tool Performance for Restoring Lost Texts (Ancient Greek) (Alexandra Lițu and Valentin Bottez)
  • Section 3: Computational Methods: Analysis, Classification, Clustering
  • Chapter 10 At the Boundaries of Syntactic Prehistory: Metric and Non-Metric Distances (Andrea Sgarro)
  • Chapter 11 Automatic Authorship Attribution in the Work of Tirso de Molina (Miguel Cavadas Docampo and Pablo Gamallo Otero)
  • Chapter 12 Computational Analysis and Author Detection for Political Discourses of Romanian Presidents (Anca Dinu, Dan Ioan Dobre, Andreea-Codrina Moldovan and Elena-Daniela Nicolescu)
  • Chapter 13 Contemporary Chronological Classifications of Hafez Poetry and Influences on French Literature (Arya Rahgozar, Mehran Rahgozar and Diana Inkpen)
  • Chapter 14 On the Use of Knowledge Graphs for Representing Past Classification Schemes for Various Genres of Literature (Thierry Declerck)
  • Notes on Contributors

←6 | 7→
Anca Dinu, Mădălina Chitez, Liviu Dinu and Mihnea Dobre


1. Background to this collection

This volume contains selected papers and invited talks from The First Recent Advances in Digital Humanities Conference (RADH), held on the 29th of October 2021, as an online conference, and organized by the Faculty of Foreign Languages and Literatures, Digital Humanities Research Centre, Faculty of History, University of Bucharest in collaboration with the CODHUS Research Centre (Centre for Corpus Related Digital Approaches to Humanities) of the Faculty of Letters, History and Theology at the West University of Timisoara.

The field of Digital Humanities grows in popularity at an unprecedented speed, while digital methods and tools in all domains of Humanities are exploding. Therefore, it becomes a necessity for scholars within the expanding community of Digital Humanities to gather, share ideas and learn from each other, and discuss the most recent trends, challenges, and solutions in the field. The wide range of Digital Humanities topics presented at the conference documents the diversity of current research problems and proves that the domain attracts researchers with different background, who can only benefit from such interactions.

Most of the papers presented in the RADH2021 conference were prepared during the COVID pandemic. This is not to complain about the lack of materials or access to some resources, but to acknowledge the creativity of the research presented in many of the chapters included in the volume. If there is a lesson to be learned from the pandemic, it is the enormous potential of the Digital Humanities as a field of research. It needs, however, to foster the growth of digital literacy of scholars in connected fields, and, at the same time, to embrace a broader variety of approaches, questions, and formats. Digital Humanities can play the role of a new cultural setting in which to base all sorts of scholarship, widening one’s understanding of the modern world.

Since the central focus of the conference was digitalization, methods and tools for the Romance languages, this volume covers work on most ←7 | 8→of the Romance languages: Latin, Italian, Romanian, Spanish, French, but also branches out to include English, Arab and Greek.

The three sections of the conference are reflected in the topic organization of this volume: Resources and digitalization, Tools and interfaces, and Computational methods analysis, classification, and clustering. The section order reflects the natural lifespan of a digital object, from its creation and representation to its pre-processing and processing and, finally, to its analysis and interpretation.

2. Content: Sections and chapters overview

During the pandemic, many scholars have turned to digital resources and have expanded the use of digital tools to several areas of research and communication. At the time when mobility restrictions limited the physical space one could access, the virtual world was always available. Social media, conference platforms, eLearning systems, and digital repositories or libraries were quickly adopted by scholars.

The first section of the book, Resources and Digitalisation has an applicative character. It starts from the awareness that Digital Humanities has become the buzzword in academic circles and beyond (Burghardt and Wolff 2017). More and more research initiatives approach traditional or new emerging topics related to the field of Digital Humanities for higher visibility and faster contribution to scientific development. But such effervescence is challenging to grasp; that is why it is important to obtain latest information on existing resources and digitalization strategies. At the same time, it is essential to acknowledge emerging synergies between the disciplines that can support further advancements for the larger Digital Humanities domain. One such complementarity relationship has been shaped around the discipline of corpus linguistics. These two aspects represent the focus in Section 1 (five chapters). The topic is also adapted for the target languages of the volume, namely Romance languages. Contributions on this topic showcase the process of using corpus linguistics methods to teach and conduct teaching-oriented research in the humanities. Exemplifications of typical tools (for example, OCR techniques for Italian documents) for text digitalizations are also provided.

In Chapter 1, “Turning 1968 Memories into Usable Texts”, the authors, Annamaria Goy, Cristina Re, Davide Colla and Marco Leontino, provide ←8 | 9→their readership with ways to use the Optical Character Recognition (OCR) system to read and recover intelligible texts in Italian and prepare them for further analyses. The main findings presented in this study are associated with the project PRiSMHA (Providing Rich Semantic Metadata for Historical Archives).

Chapter 2, “Phraseology in Romanian Academic Writing: Corpus Based Explorations into Field-Specific Multiword Units”, by Valentina Mureșan, Roxana Rogobete, Ana-Maria Bucur, Mădălina Chitez and Andreea Dincă, draws on a comparative analysis of two self-complied corpora (ROGER Corpus of Romanian Academic Genres and EXPRES Corpus of Expert Writing in Romanian and English) which are used to test a computational method for extracting filed-specific multi-word units in Romanian academic writing. The findings of the study serve as a first stage in the development of a collocation or n-gram extraction method which results in the creation of academic word / collocation / phrase lists for the Romanian language.

Tommaso Spinelli’s research (Chapter 3), “The Latin Diachronic Database: A New Digital Tool for the Study of Latin”, highlights a new digital toolkit that helps scholars and students to identify and analyse the use of Latin in literary texts, both from the perspective of the words’ frequency and development throughout time.

In Chapter 4, “A Proposal for a Multilingual E-Glossary of Discourse Markers”, Cecilia Mihaela Popescu and Oana Adriana Duță aim to establish an e-glossary of discourse markers in English and four romance languages, with the help of digital tools; this process would help scholars in their linguistic research endeavours.

Section 1 is closed by Eugen Istodor’s chapter (Chapter 5): “The Meme That Brings About the Roar. Then, Discredit. The Tismăneanu Case in Pandemic Times”, which offers a critical analysis of how social codes and social media have been reshaped in the early pandemics. Istodor examines the case of Vladimir Tismăneanu, professor of political sciences in the United States, and an important public figure in the Romanian cultural space. The use of social media, and especially the culture of sharing encouraged by social media platforms, brings into question the relation between private and public space, but also the shifted boundaries between humour and discrimination in the case of a meme shared in the early pandemics.

←9 | 10→The second section of the volume, Tools and Interfaces, puts forth four chapters that propose new tools or evaluate existing ones.

Because the electronic resources such as corpora, archives, repositories, etc. exploded in the last decade and the data quantity generated by humanity on a daily basis is overwhelming, managing this kind of data by hand or with only a few specialized professionals has become simply unfeasible. The sheer quantity and complexity of data nowadays require more and more user-friendly and intelligent tools and platforms to be processed with. Almost in every domain of the Humanities, the apparition of such new tools and interfaces is a reality. For instance, in disciplines such as the history of philosophy and science, which rely more on traditional library and archival research, the use of digital resources was, during the pandemic, the predominant way of doing research. The four chapters in this section are a fair sample of that research phenomenon.

The section begins with Chapter 6, “Natural Language Processing for Book Recommender Systems”, by Haifa Alharthi and Diana Inkpen, who describe several approaches to a literary book recommender system, that takes the text of the books into consideration. One approach considers the writing style of the authors by transferring the information learned by an author-identification model into a book recommender system. Another approach represents books using more than one hundred linguistic features including lexical, syntactic, stylometric, and fiction-based features. The two proposed systems are evaluated in a top-k recommendation scenario, and both provide higher recommendation accuracy compared with state-of-the-art content and collaborative filtering methods.

Chapter 7, “Syntactic Tree Editor”, authored by Claudius-Marian Teodorescu, presents the development of an editor for syntactic trees, to be used during the process of teaching grammar in universities or for research purposes. It complies with the Government and Binding (GB) model, but there is the possibility of extending it to support various other grammar models. The editor was designed after a comprehensive analysis of the existing software products for editing or generating syntax trees. The functionality of the tree editor and its user-friendly interface are presented in detail.

←10 | 11→The next chapter (Chapter 8) “Cartesian Visual Cosmology: Ways Towards a Digital Platform”, co-authored by Mihnea Dobre, Ovidiu Babeș and Ioana Bujor, details the work of collecting, cataloguing, and presenting in a digital form a variety of early modern cosmological images. The corpus is based on early modern prints, already digitized, but never collected as “cosmological illustrations”. It offers an entry point in the study of the imagery associated with the philosophical system of René Descartes, and details how a Digital Humanities approach to the subject can enrich early modern scholarship. The chapter includes a discussion of several digital tools, and open research questions about Cartesian natural philosophy, but also about the use of images at the time of the scientific revolution of the early modern period.

The last chapter of the section (Chapter 9), “An Evaluation of the Ithaca Tool Performance for Restoring Lost Texts (Ancient Greek)”, co-authored by Alexandra Lițu and Valentin Bottez, aims to evaluate the performance of the Ithaca tool on a funerary inscription, coming most probably from Callatis, a Greek colony on the Western shore of the Black Sea, that was fully preserved and published recently by the authors. To overcome the requirement of having at least 50 characters in order to use Ithaca, they performed some text augmentation. However, these augmentations might have proved influential for geographical and chronological attribution. One of the salient features of Ithaca seems to be its receptiveness to context. It also proved itself very sensitive to slight changes in the input text, even on the level of basic punctuation.


ISBN (Hardcover)
Publication date
2022 (November)
Berlin, Bern, Bruxelles, New York, Oxford, Warszawa, Wien, 2022. 252 pp., 68 fig. b/w, 20 tables.

Biographical notes

Madalina Chitez (Volume editor) Anca Dinu (Volume editor) Liviu Dinu (Volume editor) Mihnea Dobre (Volume editor)

Anca Dinu is Assistant Professor at University of Bucharest, Faculty Foreign Languages and Literatures and director of The Digital Humanities Research Centre, University of Bucharest. Her main research interests are Digital Humanities, Natural Language Processing, formal and distributional semantics, corpus linguistics, experimental linguistics, etc. Madalina Chitez is a Senior Researcher in Applied Corpus Linguistics at the West University of Timisoara, Romania. She is the founder and coordinator of the Digital Humanities research centre, CODHUS (Centre for Corpus Related Digital Approaches to Humanities), which has a strong language-technology applicative character. Her areas of interest and expertise are: applied corpus linguistics, digital humanities, academic writing, contrastive linguistics and computer-assisted language learning. Liviu Dinu is Professor at University of Bucharest, Faculty of Mathematics and Computer Science, Computer Science Department, and director of Human Language Technologies Research Center (nlp.unibuc.ro). His main research interests are Computational Linguistics and Natural Language Processing, with a particular focus on languages similarity, computational approaches of historical linguistics, authorship identification and computational stylometry, topic analysis and text categorization. Mihnea Dobre is teaching and doing research in the history of philosophy and science at the University of Bucharest. His principal interests are the relations between philosophy, religion and science in the early modern period and how new forms of scholarship, such as digital humanities, can inform research practice on these topics.


Title: Recent Advances in Digital Humanities