Sentence similarity using weighted path and similarity matrices

被引:1
作者
Javadzadeh, Reza [1 ]
Zahedi, Morteza [1 ]
Rahimi, Marzea [1 ]
机构
[1] Shahrood Univ Technol, Sch Comp & IT Engn, Shahrood, Iran
关键词
Sentence similarity; plagiarism detection; text mining; vector space model; paraphrase database; SEMANTIC SIMILARITY; ALGORITHM; SEARCH; SETS;
D O I
10.3906/elk-1901-91
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sentence similarity is the task of assessing how similar the two snippets of text are. Similarity techniques are used extensively in clustering, summarization, classification, plagiarism detection etc. Due to a small set of vocabularies, sentence similarity is considered to be a difficult problem in natural language processing. There are two issues in solving this problem: (1) Which similarity techniques to be used for word pair similarity and (2) How to generalize that to sentence pairs. We have used the weighted path, a WordNet-based similarity assessment, and the paraphrase database to obtain word pair similarity values. Thereafter, we extracted maximum values from the pairwise similarity matrix and computed a similarity value for a sentence pair. We have also incorporated a vector space model technique to form a robust similarity measure. Our method outperformed state-of-the-art methods on the STSS65 test dataset in Pearson's correlation of 87% compared to human similarity scores. Moreover, our approach performed on par with other methods on the STSS131 test data using the same test. Our approach outperforms all the other WordNet-based methods compared on both datasets.
引用
收藏
页码:3779 / 3790
页数:12
相关论文
共 37 条
[1]   Semantic similarity assessment of words using weighted WordNet [J].
Ahsaee, Mostafa Ghazizadeh ;
Naghibzadeh, Mahmoud ;
Naeini, S. Ehsan Yasrebi .
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2014, 5 (03) :479-490
[2]  
[Anonymous], 2008, P INT C INF PROC MAN
[3]  
[Anonymous], 2013, ACM T SPEECH LANG PR
[4]  
[Anonymous], HUMAN LANGUAGE TECHN
[5]  
Berndt D.J., 1994, Advances in Knowledge Discovery and Data Mining, P359
[6]  
Croft D, 2013, 2013 13TH UK WORKSHOP ON COMPUTATIONAL INTELLIGENCE (UKCI), P221, DOI 10.1109/UKCI.2013.6651309
[7]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[8]  
2-9
[9]  
Ganitkevitch J, 2014, LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P4276
[10]   Efficient feature integration with Wikipedia-based semantic feature extraction for Turkish text summarization [J].
Guran, Aysun ;
Bayazit, Nilgun Guler ;
Gurbuz, Mustafa Zahid .
TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2013, 21 (05) :1411-1425