Show Less
Restricted access

Translation Studies and Translation Practice: Proceedings of the 2nd International TRANSLATA Conference, 2014

Part 1


Edited By Lew N. Zybatow, Andy Stauder and Michael Ustaszewski

TRANSLATA II was the second in a series of triennial conferences on Translation and Interpreting Studies, held at the University of Innsbruck. The series is conceptualized as a forum for Translation Studies research. The contributions to this volume focus on humo(u)r translation, legal translation, and human-machine interaction in translation. The contributors also regard computer-aided translation, specialised translation, terminology as well as audiovisual translation and professional aspects in translation and interpreting.

Show Summary Details
Restricted access

Extracting Terminology by Language Independent Methods (Sanja Seljan / Ivan Dunđer / Hrvoje Stančić)


Sanja Seljan, Ivan Dunđer & Hrvoje Stančić, University of Zagreb

Extracting Terminology by Language Independent Methods

Abstract: The paper presents results of automatic term extraction from digitized monolingual corpus in pharmaceutical domain, performed by three extraction tools. Results are compared with reference list, evaluated by F-measure and analysed for possible integration into the process of digital archiving.

1. Introduction

Today’s business processes heavily relay on the possibilities of utilizing digital and digitized documents. While digitally born and archived documents can be easily, and in some cases automatically, recognised and classified, this is not always true of a large set of divergent digitized documents Firstly, they have to be processed by OCR solutions and subsequently they have to be, ideally automatically recognized as pertaining to certain types or classes of documents. This is relatively easy to accomplish if there is enough distinguishing information, e.g. barcode, uniform heading and subheading structure etc. However, if the document set is comprised of many different kinds of documents, as was the case in our research, with scarce layout similarities yet with abundant similarities relevant for the classification terminology analysis could be useful. If this proves feasible and efficient, the solutions based on this concept could be integrated into the process of digital archiving.

Automatic extraction of corpus-based terminology can help in building terminology lists which represent a valuable resource for the research, education and practical implementation. Specific terminology lists represent an intermediate step...

You are not authenticated to view the full text of this chapter or article.

This site requires a subscription or purchase to access the full text of books or journals.

Do you have any questions? Contact us.

Or login to access all content.