An efficient approach for measuring semantic relatedness using Wikipedia bidirectional links

被引:5
作者
Zhu, Xinhua [1 ]
Guo, Qingsong [1 ]
Zhang, Bo [1 ,2 ]
Li, Fei [3 ]
机构
[1] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541004, Peoples R China
[2] Hezhou Univ, Sch Math & Comp Sci, Hezhou 542899, Peoples R China
[3] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing 100081, Peoples R China
基金
中国国家自然科学基金;
关键词
Semantic relatedness; Link vector; Vector similarity metric; Disambiguation; Wikipedia; INFORMATION-CONTENT; SIMILARITY; REPRESENTATION; ASSOCIATION;
D O I
10.1007/s10489-019-01452-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The measurement of the semantic relatedness between concepts is an important fundamental research topic in natural language processing. The link-based model is the most promising relatedness method in Wikipedia-based measures because its manually defined links in Wikipedia are refined and close to the semantics of humans. This paper proposes a Wikipedia two-way link model to extend the existing Wikipedia one-way out-link model, which has a low dimension and a high efficiency, as well as being easy to implement and repeat. First, this model utilizes the out-links and in-links of concepts in Wikipedia to combine into a bidirectional link vector for concept semantic interpreter and uses a TF*IDF-based bidirectional weight method to uniformly calculate the strength of the mutual association between a given concept and its out-link or in-link concept. Second, we propose a disambiguation strategy based on the social awareness of senses that directly sorts the out-links within a disambiguation page in the order in which they occur in the disambiguation page and adopts an adjustable threshold to determine how many senses will be selected. Moreover, we also propose new vector similarity metrics based on logarithm and exponent to improve the comprehensive performance of the semantic relatedness measurements based on Wikipedia links. The experimental results on some well-recognized datasets demonstrate that our model surpasses the existing popular Naive Explicit Semantic Analysis (Naive-ESA) and Wikipedia Out-Link vector-based Measure (WOLM) methods in the current Wikipedia versions and that our bidirectional link model significantly improves the performance of the existing one-way link model in practical applications.
引用
收藏
页码:3708 / 3730
页数:23
相关论文
共 53 条
  • [1] [Anonymous], 2009, Proceedings of the 12th conference of the European chapter of the Association for Computational Linguistics, DOI DOI 10.3115/1609067.1609070
  • [2] [Anonymous], 1998, WORDNET ELECT LEXICA, DOI DOI 10.7551/MITPRESS/7287.001.0001
  • [3] [Anonymous], 1997, P 10 RES COMPUTATION
  • [4] [Anonymous], 2018, P 27 INT C COMP LING
  • [5] [Anonymous], 2013, P 51 ANN M ASS COMP
  • [6] [Anonymous], 2006, AAAI
  • [7] [Anonymous], 2007, P NZ COMP SCI RES ST
  • [8] [Anonymous], 2004, P 20 INT C COMP LING
  • [9] [Anonymous], 2008, P AAAI WORKSH WIK AR
  • [10] A framework for ontology-based question answering with application to parasite immunology
    Asiaee, Amir H.
    Minning, Todd
    Doshi, Prashant
    Tarleton, Rick L.
    [J]. JOURNAL OF BIOMEDICAL SEMANTICS, 2015, 6