The Syntax, Semantics, and Pragmatics of Japanese and Chinese
WONG PING WAI: Semantic Annotation of Chinese Texts with Message Structures Based on HowNet 271
271 WONG PING WAI Semantic Annotation of Chinese Texts with Message Structures Based on HowNet 1. Introduction Corpus annotation is not just a practical task of incorporating lin- guistic information to plain texts, it also sheds new light on the na- ture of language and the most effective means of analyzing it. This chapter reports on the task of using a knowledge base called HowNet to annotate Chinese texts with semantic information. The annotation method is Message Structure, which provides an effective way to analyze Chinese word senses and semantic dependency between words. 2. Corpus and Corpus Annotation A corpus is a collection of texts, usually in an electronic form, which may be processed by computers for various purposes, such as lin- guistic research and information technology. A corpus is useful only if we can extract information from it. However, limited information can be retrieved directly from a raw corpus since linguistic informa- tion is always implicit in plain texts. That is why we need to make such implicit information explicit by building in interpretative, lin- guistic information to the corpus. This process is called corpus anno- tation. 272 3. Annotated Chinese Corpora Efforts of annotating Chinese corpora began in the 1990s. For ex- ample, the tokenized corpus, e.g., the PH Corpus (Guo 1993), the parts-of-speech tagged corpora, e.g., the Sinica Corpus (CKIP 1995) and the PKU corpus (Yu et al. 2003), the syntactically annotated cor- pora, e.g., the Sinica Treebank (Huang et al. 2000) and the Penn Chinese Treebank...
You are not authenticated to view the full text of this chapter or article.
This site requires a subscription or purchase to access the full text of books or journals.
Do you have any questions? Contact us.Or login to access all content.