Show Less

Corpus Data across Languages and Disciplines

Series:

Piotr Pezik

Over the recent years corpus tools and methodologies have gained widespread recognition in various areas of theoretical and applied linguistics. Data lodged in corpora is explored and exploited across languages and disciplines as distinct as historical linguistics, language didactics, discourse analysis, machine translation and search engine development to name but a few. This volume contains a selection of papers presented at the 8 th edition of the Practical Applications in Language and Computers conference and it is aimed at helping a wide community of researchers, language professionals and practitioners keep up to date with new corpus theories and methodologies as well as language-related applications of computational tools and resources.

Prices

Show Summary Details
Restricted access

i-Publisher, i-Librarian and EUDocLib – Linguistic Services for the Web: Anelia Belogay, Diman Karagiozov, Damir Ćavar, Dan Cristea, Svetla Koeva, Roumen Nikolov, Maciej Ogrodniczuk, Adam Przepiórkowski, Polivios Raxis and Cristina Vertan

Extract

i-Publisher, i-Librarian and EUDocLib – Linguistic Services for the Web 1 Anelia Belogay, Diman Karagiozov, Damir Ćavar, Dan Cristea, Svetla Koeva, Roumen Nikolov, Maciej Ogrodniczuk, Adam Przepiórkowski, Polivios Raxis and Cristina Vertan Abstract This paper presents three linguistically-aware online services built on top of the multilingual framework prepared for the ICT PSP EU-co-financed project ATLAS (Applied Technology for Language-Aided CMS). The framework intends to use the state-of-the art text processing methods in order to extract information and cluster documents. These basic blocks provide the base for advanced CMS functions such as automatic categorization or text summarization. The i-Publisher is a Web-based content management platform for visual website building which integrates linguistic features to improve content navigation e.g. by interlinking documents based on extracted phrases, words and names, providing short summaries and suggested classification concepts. Apart from comprehensive language analysis, the CMS supports data visualisation as well as large volume multilingual data storage and maintenance. i-Librarian is a sample online service build with and on top of i-Publisher (as a content management layer) to illustrate the benefits of applying language technology to content administration. It allows visitors to maintain a personal workspace for storing, sharing and publishing various types of documents and have them automatically categorized, summarized and annotated with important words, phrases and names. It also allows similar documents in different languages to be found easily. The EUDocLib service basing on the collection of EU legal documents illustrates how users can easily find similar documents, obtain the summaries of desired...

You are not authenticated to view the full text of this chapter or article.

This site requires a subscription or purchase to access the full text of books or journals.

Do you have any questions? Contact us.

Or login to access all content.