Measuring Semantic Similarity between Words Using Wikipedia

被引:11
作者
Lu Zhiqiang [1 ]
Shao Werimin [1 ]
Yu Zhenhua [2 ]
机构
[1] Shanghai Univ, Sch Engn & Comp Sci, Shanghai, Peoples R China
[2] Second Ltd Liabil Co, Shandong, Peoples R China
来源
WISM: 2009 INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND MINING, PROCEEDINGS | 2009年
关键词
Text semantic similarity; wikipedia; TF-IDF; cosine similarity;
D O I
10.1109/WISM.2009.59
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semantic similarity measures play an important role in the extraction of semantic relations. Semantic similarity measures are widely used in Natural Language Processing (NLP) and information Retrieval (IR). This paper presents a new Web-based method for measuring the semantic similarity between words. Different from other methods which are based on taxonomy or Search engine in Internet, our method uses snippets from Wikipedia(1) to calculate the semantic similarity between words by using cosine similarity and TF-IDF. Also, the stemmer algorithm and stop words are used in preprocessing the snippets from Wikipedia. We set different threshold to evaluate our results in order to decrease the interference from noise and redundancy. Our method was empirically evaluated using Rubenstein-Goodenough benchmark dataset. It gives higher correlation value (with 0.615) than some existing methods. Evaluation results show that our method improves accuracy and more robust for measuring semantic similarity between words.
引用
收藏
页码:251 / +
页数:3
相关论文
共 16 条
  • [1] AGIRRE E, 1995, P 1 INT C REC ADV NL
  • [2] AKIKO A, 2002, INFORM THEORETIC PER
  • [3] [Anonymous], P 15 INT C COMP LING
  • [4] CHEN HH, 2006, P 21 INT C COMP LING, P1009
  • [5] CHURCH KW, 1927, P 27 ANN M ASS COMP, P76
  • [6] Fox C., 1990, SIGIR Forum, V24, P19, DOI 10.1145/378881.378888
  • [7] Kozima Hideki., 1993, Proceedings of the 6th conference on European chapter of the Association for Computational Linguistics, P232
  • [8] OREN N, 2002, ACM INT C P SERIES, P1009
  • [9] DEVELOPMENT AND APPLICATION OF A METRIC ON SEMANTIC NETS
    RADA, R
    MILI, H
    BICKNELL, E
    BLETTNER, M
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1989, 19 (01): : 17 - 30
  • [10] Resnik P., 1995, P 14 INT JOINT C ART, V1, P448