Translation Studies and Translation Practice: Proceedings of the 2nd International TRANSLATA Conference, 2014
TRANSLATA II was the second in a series of triennial conferences on Translation and Interpreting Studies, held at the University of Innsbruck. The series is conceptualized as a forum for Translation Studies research. The contributions to this volume focus on humo(u)r translation, legal translation, and human-machine interaction in translation. The contributors also regard computer-aided translation, specialised translation, terminology as well as audiovisual translation and professional aspects in translation and interpreting.
Extracting Terminology by Language Independent Methods (Sanja Seljan / Ivan Dunđer / Hrvoje Stančić)
Sanja Seljan, Ivan Dunđer & Hrvoje Stančić, University of Zagreb
Extracting Terminology by Language Independent Methods
Abstract: The paper presents results of automatic term extraction from digitized monolingual corpus in pharmaceutical domain, performed by three extraction tools. Results are compared with reference list, evaluated by F-measure and analysed for possible integration into the process of digital archiving.
Today’s business processes heavily relay on the possibilities of utilizing digital and digitized documents. While digitally born and archived documents can be easily, and in some cases automatically, recognised and classified, this is not always true of a large set of divergent digitized documents Firstly, they have to be processed by OCR solutions and subsequently they have to be, ideally automatically recognized as pertaining to certain types or classes of documents. This is relatively easy to accomplish if there is enough distinguishing information, e.g. barcode, uniform heading and subheading structure etc. However, if the document set is comprised of many different kinds of documents, as was the case in our research, with scarce layout similarities yet with abundant similarities relevant for the classification terminology analysis could be useful. If this proves feasible and efficient, the solutions based on this concept could be integrated into the process of digital archiving.