Computing semantic similarity based on novel models of semantic representation using Wikipedia

被引：39

作者：

Qu, Rong ^{[1
]}

Fang, Yongyi ^{[1
]}

Bai, Wen ^{[2
]}

Jiang, Yuncheng ^{[1
]}

机构：

[1] South China Normal Univ, Sch Comp Sci, Guangzhou 510631, Guangdong, Peoples R China

[2] Sun Yat Sen Univ, Collaborat Innovat Ctr High Performance Comp, Guangzhou 510006, Guangdong, Peoples R China

来源：

INFORMATION PROCESSING & MANAGEMENT | 2018年 / 54卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Semantic similarity; Concept similarity; Information content; Feature-based methods; Wikipedia; INFORMATION-CONTENT; BIOMEDICAL DOMAIN; WORDNET; RELATEDNESS; KNOWLEDGE; FEATURES; IDENTIFICATION; COMPUTATION; ONTOLOGIES; RANKING;

D O I：

10.1016/j.ipm.2018.07.002

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Computing Semantic Similarity (SS) between concepts is one of the most critical issues in many domains such as Natural Language Processing and Artificial Intelligence. Over the years, several SS measurement methods have been proposed by exploiting different knowledge resources. Wikipedia provides a large domain-independent encyclopedic repository and a semantic network for computing SS between concepts. Traditional feature-based measures rely on linear combinations of different properties with two main limitations, the insufficient information and the loss of semantic information. In this paper, we propose several hybrid SS measurement approaches by using the Information Content (IC) and features of concepts, which avoid the limitations introduced above. Considering integrating discrete properties into one component, we present two models of semantic representation, called CORM and CARM. Then, we compute SS based on these models and take the IC of categories as a supplement of SS measurement. The evaluation, based on several widely used benchmarks and a benchmark developed by ourselves, sustains the intuitions with respect to human judgments. In summary, our approaches are more efficient in determining SS between concepts and have a better human correlation than previous methods such as Word2Vec and NASARI.

引用

页码：1002 / 1021

页数：20

共 48 条

[1] Semantic similarity assessment of words using weighted WordNet
Ahsaee, Mostafa Ghazizadeh
Naghibzadeh, Mahmoud
Naeini, S. Ehsan Yasrebi
[J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2014, 5 (03) : 479 - 490
[2] Paraphrase identification and semantic text similarity analysis in Arabic news tweets using lexical, syntactic, and semantic features
Al-Smadi, Mohammad
Jaradat, Zain
Al-Ayyoub, Mahmoud
Jararweh, Yaser
[J]. INFORMATION PROCESSING & MANAGEMENT, 2017, 53 (03) : 640 - 652
[3] [Anonymous], APPL INTELL
[4] [Anonymous], 2009, N AM CHAPTER ASS COM
[5] [Anonymous], BRIEF BIOINFORMA
[6] [Anonymous], 2013, INT C LEARNING REPRE
[7] An ontology-based measure to compute semantic similarity in biomedicine
Batet, Montserrat
Sanchez, David
Valls, Aida
[J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2011, 44 (01) : 118 - 125
[8] A neural probabilistic language model
Bengio, Y
Ducharme, R
Vincent, P
Jauvin, C
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (06) : 1137 - 1155
[9] Bollegala D, 2015, COMPUTER SCI, V5, P757
[10] NASARI: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities
Camacho-Collados, Jose
Pilehvar, Mohammad Taher
Navigli, Roberto
[J]. ARTIFICIAL INTELLIGENCE, 2016, 240 : 36 - 64

← 1 2 3 4 5 →