Measuring Semantic Similarity between Words Using Wikipedia

被引:11
|
作者
Lu Zhiqiang [1 ]
Shao Werimin [1 ]
Yu Zhenhua [2 ]
机构
[1] Shanghai Univ, Sch Engn & Comp Sci, Shanghai, Peoples R China
[2] Second Ltd Liabil Co, Shandong, Peoples R China
关键词
Text semantic similarity; wikipedia; TF-IDF; cosine similarity;
D O I
10.1109/WISM.2009.59
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semantic similarity measures play an important role in the extraction of semantic relations. Semantic similarity measures are widely used in Natural Language Processing (NLP) and information Retrieval (IR). This paper presents a new Web-based method for measuring the semantic similarity between words. Different from other methods which are based on taxonomy or Search engine in Internet, our method uses snippets from Wikipedia(1) to calculate the semantic similarity between words by using cosine similarity and TF-IDF. Also, the stemmer algorithm and stop words are used in preprocessing the snippets from Wikipedia. We set different threshold to evaluate our results in order to decrease the interference from noise and redundancy. Our method was empirically evaluated using Rubenstein-Goodenough benchmark dataset. It gives higher correlation value (with 0.615) than some existing methods. Evaluation results show that our method improves accuracy and more robust for measuring semantic similarity between words.
引用
收藏
页码:251 / +
页数:3
相关论文
共 50 条
  • [41] Novel Approach to Find Semantic Similarity Measure between Words
    Sahni, Lakshay
    Sehgal, Anubhav
    Kochar, Shaivi
    Ahmad, Faiyaz
    Ahmad, Tanvir
    PROCEEDINGS OF 2014 2ND INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL AND BUSINESS INTELLIGENCE (ISCBI), 2014, : 89 - 92
  • [42] An efficient technique for finding semantic similarity and their frequency between words
    Yadav, Sonu
    Sain, Deepak
    2013 INTERNATIONAL CONFERENCE ON GREEN COMPUTING, COMMUNICATION AND CONSERVATION OF ENERGY (ICGCE), 2013, : 159 - 163
  • [43] Analysis of Japanese Wikipedia Category for Constructing Wikipedia Ontology and Semantic Similarity Measure
    Yoshioka, Masaharu
    INFORMATION RETRIEVAL TECHNOLOGY, AIRS 2014, 2014, 8870 : 470 - 481
  • [44] Measuring Semantic Similarity between Concepts in Visual Domain
    Wang, Zhiyong
    Guan, Genliang
    Wang, Jiajun
    Feng, Dagan
    2008 IEEE 10TH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, VOLS 1 AND 2, 2008, : 632 - +
  • [45] Measuring semantic similarity between Gene Ontology terms
    Couto, Francisco M.
    Silva, Mario J.
    Coutinho, Pedro M.
    DATA & KNOWLEDGE ENGINEERING, 2007, 61 (01) : 137 - 152
  • [46] Measuring semantic similarity between geospatial conceptual regions
    Schwering, A
    Raubal, M
    GEOSPATIAL SEMANTICS, PROCEEDINGS, 2005, 3799 : 90 - 106
  • [47] Semantic Similarity Measurements for Multi-lingual Short Texts Using Wikipedia
    Nakamura, Tatsuya
    Shirakawa, Masumi
    Hara, Takahiro
    Nishio, Shojiro
    2014 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 2, 2014, : 22 - 29
  • [48] Measuring Semantic Relatedness using Wikipedia Revision Information in a Signed Network
    Yang, Wen-Teng
    Kao, Hung-Yu
    2011 INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI 2011), 2011, : 69 - 74
  • [49] An efficient approach for measuring semantic relatedness using Wikipedia bidirectional links
    Xinhua Zhu
    Qingsong Guo
    Bo Zhang
    Fei Li
    Applied Intelligence, 2019, 49 : 3708 - 3730
  • [50] An efficient approach for measuring semantic relatedness using Wikipedia bidirectional links
    Zhu, Xinhua
    Guo, Qingsong
    Zhang, Bo
    Li, Fei
    APPLIED INTELLIGENCE, 2019, 49 (10) : 3708 - 3730