Learning domain ontologies from document warehouses and dedicated web sites

被引:187
作者
Navigli, R [1 ]
Velardi, P [1 ]
机构
[1] Univ Roma La Sapienza, Dipartimento Informat, I-00198 Rome, Italy
关键词
D O I
10.1162/089120104323093276
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a method and a tool, OntoLearn, aimed at the extraction of domain ontologies from Web sites, and more generally from documents shared among the members of virtual organizations. OntoLearn first extracts a domain terminology from available documents. Then, complex domain terms are semantically interpreted and arranged in a hierarchical fashion. Finally, a general-purpose ontology, WordNet, is trimmed and enriched with the detected domain concepts. The major novel aspect of this approach is semantic interpretation, that is, the association of a complex concept with a complex term. This involves finding the appropriate WordNet concept for each word of a terminological string and the appropriate conceptual relations that hold among the concept components. Semantic interpretation is based on a new word sense disambiguation algorithm, called structural semantic interconnections.
引用
收藏
页码:151 / 179
页数:29
相关论文
共 33 条
  • [1] AGIRRE E, 2000, ECAI ONT LEARN WORKS
  • [2] ALFONSECA E, 2002, LANGUAGE RESOURCES E
  • [3] An empirical symbolic approach to natural language processing
    Basili, R
    Pazienza, MT
    Velardi, P
    [J]. ARTIFICIAL INTELLIGENCE, 1996, 85 (1-2) : 59 - 99
  • [4] BASILI R, 1998, P EUR C ART INT ECAI
  • [5] Berland M., 1999, P 37 ANN M ASS COMP
  • [6] Berners-Lee Tim., 1999, WEAVING WEB ORIGINAL
  • [7] Bunke H., 1990, SYNTACTIC STRUCTURAL
  • [8] Church K. W., 1989, ACL 89
  • [9] DAELEMANS W, 1999, ILK9901 TIB U
  • [10] FARQUHAR A, 1998, COLLABORATIVE ONTOLO