Computing semantic similarity based on novel models of semantic representation using Wikipedia

被引:39
作者
Qu, Rong [1 ]
Fang, Yongyi [1 ]
Bai, Wen [2 ]
Jiang, Yuncheng [1 ]
机构
[1] South China Normal Univ, Sch Comp Sci, Guangzhou 510631, Guangdong, Peoples R China
[2] Sun Yat Sen Univ, Collaborat Innovat Ctr High Performance Comp, Guangzhou 510006, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Semantic similarity; Concept similarity; Information content; Feature-based methods; Wikipedia; INFORMATION-CONTENT; BIOMEDICAL DOMAIN; WORDNET; RELATEDNESS; KNOWLEDGE; FEATURES; IDENTIFICATION; COMPUTATION; ONTOLOGIES; RANKING;
D O I
10.1016/j.ipm.2018.07.002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Computing Semantic Similarity (SS) between concepts is one of the most critical issues in many domains such as Natural Language Processing and Artificial Intelligence. Over the years, several SS measurement methods have been proposed by exploiting different knowledge resources. Wikipedia provides a large domain-independent encyclopedic repository and a semantic network for computing SS between concepts. Traditional feature-based measures rely on linear combinations of different properties with two main limitations, the insufficient information and the loss of semantic information. In this paper, we propose several hybrid SS measurement approaches by using the Information Content (IC) and features of concepts, which avoid the limitations introduced above. Considering integrating discrete properties into one component, we present two models of semantic representation, called CORM and CARM. Then, we compute SS based on these models and take the IC of categories as a supplement of SS measurement. The evaluation, based on several widely used benchmarks and a benchmark developed by ourselves, sustains the intuitions with respect to human judgments. In summary, our approaches are more efficient in determining SS between concepts and have a better human correlation than previous methods such as Word2Vec and NASARI.
引用
收藏
页码:1002 / 1021
页数:20
相关论文
共 48 条
  • [1] Semantic similarity assessment of words using weighted WordNet
    Ahsaee, Mostafa Ghazizadeh
    Naghibzadeh, Mahmoud
    Naeini, S. Ehsan Yasrebi
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2014, 5 (03) : 479 - 490
  • [2] Paraphrase identification and semantic text similarity analysis in Arabic news tweets using lexical, syntactic, and semantic features
    Al-Smadi, Mohammad
    Jaradat, Zain
    Al-Ayyoub, Mahmoud
    Jararweh, Yaser
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2017, 53 (03) : 640 - 652
  • [3] [Anonymous], APPL INTELL
  • [4] [Anonymous], 2009, N AM CHAPTER ASS COM
  • [5] [Anonymous], BRIEF BIOINFORMA
  • [6] [Anonymous], 2013, INT C LEARNING REPRE
  • [7] An ontology-based measure to compute semantic similarity in biomedicine
    Batet, Montserrat
    Sanchez, David
    Valls, Aida
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2011, 44 (01) : 118 - 125
  • [8] A neural probabilistic language model
    Bengio, Y
    Ducharme, R
    Vincent, P
    Jauvin, C
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (06) : 1137 - 1155
  • [9] Bollegala D, 2015, COMPUTER SCI, V5, P757
  • [10] NASARI: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities
    Camacho-Collados, Jose
    Pilehvar, Mohammad Taher
    Navigli, Roberto
    [J]. ARTIFICIAL INTELLIGENCE, 2016, 240 : 36 - 64