A NEW METHODOLOGY FOR DOMAIN ONTOLOGY CONSTRUCTION FROM THE WEB

被引:6
作者
Frikh, Bouchra [1 ]
Djaanfar, Ahmed Said [2 ]
Ouhbi, Brahim [3 ]
机构
[1] Ecole Super Technol, Fes, Morocco
[2] Fac Sci Dhar El Mahraz, Lab Informat & Modelisat, Fes, Morocco
[3] Ecole Natl Super Arts & Metiers, Meknes, Morocco
关键词
Ontology; taxonomy; CHIR; mutual information; information retrieval; DOCUMENTS; SIMILARITY; MODELS;
D O I
10.1142/S0218213011000565
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Resources like ontologies are used in a number of applications, including natural language processing, information retrieval(especially from the Internet). Different methods have been proposed to build such resources. This paper proposes a new method to extract information from the Web to build a taxonomy of terms and Web resources for a given domain. Firstly, a (CHIR) method is used to identify candidat terms. Then a similarity (SIM) measure is introduced to select relevant concepts to build the ontology. Our new algorithm, called (CHIRSIM), is easy to implement and can be efficiently integrated into an information retrieval system to help improve the retrieval performance. Experimental results show that the proposed approach can effectively and efficiently construct a cancer domain ontology from unstructured text documents.
引用
收藏
页码:1157 / 1170
页数:14
相关论文
共 34 条
[1]   Automatic ontology-based knowledge extraction from web documents [J].
Alani, H ;
Kim, S ;
Millard, DE ;
Weal, MJ ;
Hall, W ;
Lewis, PH ;
Shadbolt, NR .
IEEE INTELLIGENT SYSTEMS, 2003, 18 (01) :14-21
[2]  
[Anonymous], 2005, P C DAT MIN DAT WAR
[3]  
[Anonymous], P 12 EUR C MACH LEAR
[4]  
[Anonymous], 2004, P INT C LANG RES EV
[5]  
BRUN A, 2002, 9 C FRAN TALN 2002 N
[6]  
BUDANITSKY A, 1999, CSRG390 U TOR
[7]  
Church K. W., 1990, Computational Linguistics, V16, P22
[8]  
CROFT B., 1998, P 21 INT C RES DEV I
[9]   Similarity-based models of word cooccurrence probabilities [J].
Dagan, I ;
Lee, L ;
Pereira, FCN .
MACHINE LEARNING, 1999, 34 (1-3) :43-69
[10]  
Dellschaft Klaas, 2006, P 5 INT SEM WEB C IS