Measuring Semantic Similarity between Words Using Wikipedia

被引:11
|
作者
Lu Zhiqiang [1 ]
Shao Werimin [1 ]
Yu Zhenhua [2 ]
机构
[1] Shanghai Univ, Sch Engn & Comp Sci, Shanghai, Peoples R China
[2] Second Ltd Liabil Co, Shandong, Peoples R China
关键词
Text semantic similarity; wikipedia; TF-IDF; cosine similarity;
D O I
10.1109/WISM.2009.59
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semantic similarity measures play an important role in the extraction of semantic relations. Semantic similarity measures are widely used in Natural Language Processing (NLP) and information Retrieval (IR). This paper presents a new Web-based method for measuring the semantic similarity between words. Different from other methods which are based on taxonomy or Search engine in Internet, our method uses snippets from Wikipedia(1) to calculate the semantic similarity between words by using cosine similarity and TF-IDF. Also, the stemmer algorithm and stop words are used in preprocessing the snippets from Wikipedia. We set different threshold to evaluate our results in order to decrease the interference from noise and redundancy. Our method was empirically evaluated using Rubenstein-Goodenough benchmark dataset. It gives higher correlation value (with 0.615) than some existing methods. Evaluation results show that our method improves accuracy and more robust for measuring semantic similarity between words.
引用
收藏
页码:251 / +
页数:3
相关论文
共 50 条
  • [11] An Integrated Approach for Measuring Semantic Similarity between Words and Sentences using Web Search Engine
    Adhikesavan, Kavitha
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2015, 12 (06) : 589 - 596
  • [12] An Efficient Approach for Measuring Semantic Similarity Combining WordNet and Wikipedia
    Li, Fei
    Liao, Lejian
    Zhang, Lanfang
    Zhu, Xinhua
    Zhang, Bo
    Wang, Zheng
    IEEE ACCESS, 2020, 8 : 184318 - 184338
  • [13] Measuring semantic similarity between words by removing noise and redundancy in web snippets
    Xu, Zheng
    Luo, Xiangfeng
    Yu, Jie
    Xu, Weimin
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2011, 23 (18): : 2496 - 2510
  • [14] Assessing Semantic Similarity Between Concepts Using Wikipedia Based on Nonlinear Fitting
    Huang, Guangjian
    Jiang, Yuncheng
    Ma, Wenjun
    Liu, Weiru
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2019, PT II, 2019, 11776 : 159 - 171
  • [15] A Survey on Semantic Similarity between Words in Semantic Web
    Ilakiya, P.
    Sumathi, M.
    Karthik, S.
    2012 INTERNATIONAL CONFERENCE ON RADAR, COMMUNICATION AND COMPUTING (ICRCC), 2012, : 213 - 216
  • [16] Wikipedia bi-linear link (WBLM) model: A new approach for measuring semantic similarity and relatedness between linguistic concepts using Wikipedia link structure
    Hussain, Muhammad Jawad
    Bai, Heming
    Jiang, Yuncheng
    INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (02)
  • [17] A graph modeling of semantic similarity between words
    Alvarez, Marco A.
    Lim, SeungJin
    ICSC 2007: INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, PROCEEDINGS, 2007, : 355 - +
  • [18] Measure Semantic Similarity between English Words
    Hu, Jinwu
    Dai, Liuling
    Liu, Bin
    PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE FOR YOUNG COMPUTER SCIENTISTS, VOLS 1-5, 2008, : 1689 - +
  • [19] Measuring Semantic Relatedness using Wikipedia Signed Network
    Yang, Wen-Teng
    Kao, Hung-Yu
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2013, 29 (04) : 615 - 630
  • [20] Computing semantic similarity based on novel models of semantic representation using Wikipedia
    Qu, Rong
    Fang, Yongyi
    Bai, Wen
    Jiang, Yuncheng
    INFORMATION PROCESSING & MANAGEMENT, 2018, 54 (06) : 1002 - 1021