Sentence similarity using weighted path and similarity matrices

被引：1

作者：

Javadzadeh, Reza ^{[1
]}

Zahedi, Morteza ^{[1
]}

Rahimi, Marzea ^{[1
]}

机构：

[1] Shahrood Univ Technol, Sch Comp & IT Engn, Shahrood, Iran

来源：

TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES | 2019年 / 27卷 / 05期

关键词：

Sentence similarity; plagiarism detection; text mining; vector space model; paraphrase database; SEMANTIC SIMILARITY; ALGORITHM; SEARCH; SETS;

D O I：

10.3906/elk-1901-91

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Sentence similarity is the task of assessing how similar the two snippets of text are. Similarity techniques are used extensively in clustering, summarization, classification, plagiarism detection etc. Due to a small set of vocabularies, sentence similarity is considered to be a difficult problem in natural language processing. There are two issues in solving this problem: (1) Which similarity techniques to be used for word pair similarity and (2) How to generalize that to sentence pairs. We have used the weighted path, a WordNet-based similarity assessment, and the paraphrase database to obtain word pair similarity values. Thereafter, we extracted maximum values from the pairwise similarity matrix and computed a similarity value for a sentence pair. We have also incorporated a vector space model technique to form a robust similarity measure. Our method outperformed state-of-the-art methods on the STSS65 test dataset in Pearson's correlation of 87% compared to human similarity scores. Moreover, our approach performed on par with other methods on the STSS131 test data using the same test. Our approach outperforms all the other WordNet-based methods compared on both datasets.

引用

页码：3779 / 3790

页数：12

共 37 条

[1] Semantic similarity assessment of words using weighted WordNet [J].

Ahsaee, Mostafa Ghazizadeh ;

Naghibzadeh, Mahmoud ;

Naeini, S. Ehsan Yasrebi .

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2014, 5 (03) :479-490

[2]

[Anonymous], 2008, P INT C INF PROC MAN

[3]

[Anonymous], 2013, ACM T SPEECH LANG PR

[4]

[Anonymous], HUMAN LANGUAGE TECHN

[5]

Berndt D.J., 1994, Advances in Knowledge Discovery and Data Mining, P359

[6]

Croft D, 2013, 2013 13TH UK WORKSHOP ON COMPUTATIONAL INTELLIGENCE (UKCI), P221, DOI 10.1109/UKCI.2013.6651309

[7]

DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO

[8]

2-9

[9]

Ganitkevitch J, 2014, LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P4276

[10] Efficient feature integration with Wikipedia-based semantic feature extraction for Turkish text summarization [J].

Guran, Aysun ;

Bayazit, Nilgun Guler ;

Gurbuz, Mustafa Zahid .

TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2013, 21 (05) :1411-1425

← 1 2 3 4 →