Measuring Semantic Similarity between Words Using Wikipedia

被引:11
|
作者
Lu Zhiqiang [1 ]
Shao Werimin [1 ]
Yu Zhenhua [2 ]
机构
[1] Shanghai Univ, Sch Engn & Comp Sci, Shanghai, Peoples R China
[2] Second Ltd Liabil Co, Shandong, Peoples R China
关键词
Text semantic similarity; wikipedia; TF-IDF; cosine similarity;
D O I
10.1109/WISM.2009.59
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semantic similarity measures play an important role in the extraction of semantic relations. Semantic similarity measures are widely used in Natural Language Processing (NLP) and information Retrieval (IR). This paper presents a new Web-based method for measuring the semantic similarity between words. Different from other methods which are based on taxonomy or Search engine in Internet, our method uses snippets from Wikipedia(1) to calculate the semantic similarity between words by using cosine similarity and TF-IDF. Also, the stemmer algorithm and stop words are used in preprocessing the snippets from Wikipedia. We set different threshold to evaluate our results in order to decrease the interference from noise and redundancy. Our method was empirically evaluated using Rubenstein-Goodenough benchmark dataset. It gives higher correlation value (with 0.615) than some existing methods. Evaluation results show that our method improves accuracy and more robust for measuring semantic similarity between words.
引用
收藏
页码:251 / +
页数:3
相关论文
共 50 条
  • [1] Measuring Semantic Similarity between Words Using HowNet
    Dai, Liuling
    Liu, Bin
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, 2008, : 601 - +
  • [2] Measuring Semantic Similarity between Words Using Web Documents
    Takale, Sheetal A.
    Nandgaonkar, Sushma S.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2010, 1 (04) : 78 - 85
  • [3] An approach for measuring semantic similarity between Wikipedia concepts using multiple inheritances
    Hussain, Muhammad Jawad
    Wasti, Shahbaz Hassan
    Huang, Guangjian
    Wei, Lina
    Jiang, Yuncheng
    Tang, Yong
    INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (03)
  • [4] Measuring semantic similarity between words using multiple information sources
    Lei, Jingsheng
    Journal of Information and Computational Science, 2010, 7 (02): : 601 - 608
  • [5] Capturing Semantic Similarity for Words in Wikipedia with Random Walk
    Duan, Jianyong
    Cui, Jiayuan
    Wu, Mingli
    Wang, Hao
    PROCEEDINGS OF 2018 5TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2018, : 709 - 713
  • [6] An approach for measuring semantic similarity between words using multiple information sources
    Li, YH
    Bandar, ZA
    McLean, D
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2003, 15 (04) : 871 - 882
  • [7] Measuring semantic similarity between words using lexical knowledge and neural networks
    Li, YH
    Bandar, Z
    Mclean, D
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2002, 2002, 2412 : 111 - 116
  • [8] A fuzzy approach for measuring the semantic similarity between words in WordNet
    Song, Ling
    Ma, Jun
    Lei, Jingsheng
    Li, Chao
    Journal of Information and Computational Science, 2009, 6 (03): : 1673 - 1680
  • [9] An Ontology-Based Approach for Measuring Semantic Similarity Between Words
    Zhang, Ruiling
    Xiong, Shengwu
    Chen, Zhong
    ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, ICIC 2015, PT III, 2015, 9227 : 510 - 516
  • [10] Measuring Semantic Similarity between Words Based on Multiple Relational Information
    Duan, Jianyong
    Wu, Yuwei
    Wu, Mingli
    Wang, Hao
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (01) : 163 - 169