Learning domain ontologies from document warehouses and dedicated web sites

被引：187

作者：

Navigli, R ^{[1
]}

Velardi, P ^{[1
]}

机构：

[1] Univ Roma La Sapienza, Dipartimento Informat, I-00198 Rome, Italy

来源：

COMPUTATIONAL LINGUISTICS | 2004年 / 30卷 / 02期

关键词：

D O I：

10.1162/089120104323093276

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present a method and a tool, OntoLearn, aimed at the extraction of domain ontologies from Web sites, and more generally from documents shared among the members of virtual organizations. OntoLearn first extracts a domain terminology from available documents. Then, complex domain terms are semantically interpreted and arranged in a hierarchical fashion. Finally, a general-purpose ontology, WordNet, is trimmed and enriched with the detected domain concepts. The major novel aspect of this approach is semantic interpretation, that is, the association of a complex concept with a complex term. This involves finding the appropriate WordNet concept for each word of a terminological string and the appropriate conceptual relations that hold among the concept components. Semantic interpretation is based on a new word sense disambiguation algorithm, called structural semantic interconnections.

引用

页码：151 / 179

页数：29

共 33 条

[1] AGIRRE E, 2000, ECAI ONT LEARN WORKS
[2] ALFONSECA E, 2002, LANGUAGE RESOURCES E
[3] An empirical symbolic approach to natural language processing
Basili, R
Pazienza, MT
Velardi, P
[J]. ARTIFICIAL INTELLIGENCE, 1996, 85 (1-2) : 59 - 99
[4] BASILI R, 1998, P EUR C ART INT ECAI
[5] Berland M., 1999, P 37 ANN M ASS COMP
[6] Berners-Lee Tim., 1999, WEAVING WEB ORIGINAL
[7] Bunke H., 1990, SYNTACTIC STRUCTURAL
[8] Church K. W., 1989, ACL 89
[9] DAELEMANS W, 1999, ILK9901 TIB U
[10] FARQUHAR A, 1998, COLLABORATIVE ONTOLO

← 1 2 3 4 →