Ontology-based approach for measuring semantic similarity

被引:54
作者
Taieb, Mohamed Ali Hadj [1 ]
Ben Aouicha, Mohamed [1 ]
Ben Hamadou, Abdelmajid [1 ]
机构
[1] Sfax Univ, Multimedia Informat Syst & Adv Comp Lab, Sfax 3021, Tunisia
关键词
Semantic similarity; WordNet ontology; Taxonomic knowledge; Taxonomical parameters; RELATEDNESS; FEATURES; DOMAIN;
D O I
10.1016/j.engappai.2014.07.015
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The challenge of measuring semantic similarity between words is to find a method that can simulate the thinking process of human. The use of computers to quantify and compare semantic similarities has become an important area of research in various fields, including artificial intelligence, knowledge management, information retrieval and natural language processing. The development of efficient measures for the computation of concept similarity is fundamental for computational semantics. Several computational measures rely on knowledge resources to quantify semantic similarity, such as the WordNet is a taxonomy. Several of these measures are based on taxonomical parameters to achieve the best expression possible for the semantics of content. This paper presents a new measure for quantifying the degree of the semantic similarity between concepts and words based on the WordNet hierarchy and using a number of topological parameters related to the "is a" taxonomy. Our proposal combines, in a complementary way, the hyponyms and depth parameters. This measure takes the problem of fine granularity into account It is argued, however, that WordNet sense distinctions are highly fine-grained even for humans. We, therefore, propose a new method to quantify the hyponyms subgraph of a given concept based on depth distribution. Common nouns datasets (RG65, MOO and AG203), medical terms dataset (MED38) and verbs dataset (YP130) formed by word pairs are used in the assessment. We start by calculating semantic similarities and then compute the correlation coefficient between human judgement and computational measures. The results demonstrate that, compared to other currently available computational methods, the measure presented in this study yields into better levels of performance. Compared to several measures, it shows good accuracy covering all the pairwises of the verbs dataset YP130. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:238 / 261
页数:24
相关论文
共 58 条
[1]  
Al-Mubaid Hisham, 2006, Conf Proc IEEE Eng Med Biol Soc, V2006, P2713
[2]  
[Anonymous], 1997, P 10 RES COMP LING I
[3]  
[Anonymous], 2009, N AM CHAPTER ASS COM
[4]  
[Anonymous], 2001, NAACL 2001
[5]   Discovering implicit intention-level knowledge from natural-language texts [J].
Atkinson, John ;
Ferreira, Anita ;
Aravena, Elvis .
KNOWLEDGE-BASED SYSTEMS, 2009, 22 (07) :502-508
[6]  
Banerjee S., 2003, P 18 INT JOINT C ART, P805
[7]  
Curran JR, 2002, PROCEEDINGS OF THE 2002 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, P222
[8]  
Devitt A., 2004, P 2 GLOBAL WORDNET C, P106
[9]  
Ding L., 2004, P 13 ACM INT C INF K, P652, DOI DOI 10.1145/1031171.1031289
[10]  
Dou Hao, 2011, Proceedings of the 2011 Second International Conference on Digital Manufacturing and Automation (ICDMA 2011), P177, DOI 10.1109/ICDMA.2011.50