Corpus-based knowledge representation in specialized domains (Assunta Caruso / Antonietta Folino)
Assunta Caruso / Antonietta Folino Corpus-based knowledge representation in specialized domains1 1. Introduction The advantages of using corpus tools in terminological work have by now become well founded (Bowker 1996; Bowker and Pearson 2002; Meyer and Mackintosh 1996; Pearson 1998). In particular, the advan- tages of creating thesauri through a corpus-based approach include the possibility of extracting terms which are actually used in current written language according to evidence-based linguistic criteria. Indeed, crite- ria such as representativeness and balance, established by corpus lin- guistics as indexes of a well-constructed corpus, should be considered in this particular use of corpora, i.e. deﬁning terminological resources that describe a speciﬁc domain in a comprehensive manner. Accord- ingly, the quality of the corpus could be measured a posteriori, by evaluating the quantity and the representativeness of the extracted terms. To this end, this paper aims at presenting the compilation of a specialized comparable corpus in the domain of tourism followed by the construction of a controlled vocabulary, whose function will consist in domain knowledge representation, terminological control, indexing and information retrieval. The work described in this paper has been conducted within the framework of the project DiCeT-INMOTO-OR.C.HE.S.T.R.A2, part of 1 Although the authors have cooperated in the research work and in writing the paper, they have individually devoted speciﬁc attention to the following sec- tions: Caruso: 1; 2; 3 (3.1 – 3.1.1 – 3.1.2); Folino: 3 (3.2 – 3.3); 4. 2 DiCeT – LivingLab Di Cultura e Tecnologia; INMOTO – INformation and MObility for TOurism; OR.C.HE.S.T.R.A....
You are not authenticated to view the full text of this chapter or article.
This site requires a subscription or purchase to access the full text of books or journals.
Do you have any questions? Contact us.Or login to access all content.