Show Less
Restricted access

«Quo vadis, Kommunikation?» Kommunikation – Sprache – Medien / «Quo vadis, Communication?» Communication – Language – Media

Akten des 46. Linguistischen Kolloquiums in Sibiu 2011- Proceedings of the 46 th Linguistics Colloquium, Sibiu 2011

Series:

Edited By Ioana-Narcisa Cretu

Quo vadis, Kommunikation? Kommunikation – Sprache – Medien ist der Tagungsband des 46. Linguistischen Kolloquiums an der Lucian-Blaga-Universität in Sibiu/Hermannstadt, Rumänien. Die Essays beleuchten die Rolle der Medien in der heutigen Kommunikation: sie sind zugleich Ausgangspunkt oder Anwendungsgebiet von Betrachtungen zu den traditionellen Kernbereichen der Linguistik oder zur Angewandten Linguistik. Der Band umfasst Beiträge in deutscher, englischer und französischer Sprache von 30 verschiedenen Universitäten aus 14 Ländern.
Quo vadis, Communication? Communication – Language – Media presents contributions of the 46th Linguistics Colloquium at the University of Sibiu, Romania. The essays offer a critical review of the influence of modern media on communication and how media have become the subject of research in different linguistic fields. The volume comprises papers in German, English and French from 30 different universities.
Show Summary Details
Restricted access

Extracting Translation Equivalents from Monolingual Corpora for Statistical Machine Translation

Extract



1    Introduction

As numerous machine translation (MT) contests and successful commercial systems such as Google Translate or Bing Translator show, statistical machine translation (SMT), often enriched with some hybrid elements, works well in practice and has the advantage that it is possible to come up with new language pairs in a fraction of the time that used to be required in rule-based translation. However, SMT is based on parallel corpora, which are hard to acquire for many language pairs especially when involving lesser-used languages. Interlingua-based systems can sometimes resolve the data acquisition bottleneck, but their drawback is a lower translation quality, as two error prone translation processes have to be conducted in sequence.

In the work presented here, which was conducted in the EU FP 7 project Monotrans (2009 to 2011) we investigated another solution to the data acquisition bottleneck which requires only a bilingual dictionary and large monolingual rather than parallel corpora. It is based on a two-stage procedure. In the first stage, the translation equivalents of n-grams of up to length five are determined by generating a number of translation candidates and then selecting the one which has the highest occurrence frequency in the corpus of the target language. The generation of the translation candidates is based on a bilingual dictionary, but can optionally also utilize thesauri of related words of the source or the target language. In the second stage full sentences are translated by combining the translation equivalents of the...

You are not authenticated to view the full text of this chapter or article.

This site requires a subscription or purchase to access the full text of books or journals.

Do you have any questions? Contact us.

Or login to access all content.