Assessing sentence similarity through lexical, syntactic and semantic analysis

被引:37
作者
Ferreira, Rafael [1 ,2 ]
Lins, Rafael Dueire [1 ]
Simske, Steven J. [3 ]
Freitas, Fred [1 ]
Riss, Marcelo [4 ]
机构
[1] Univ Fed Pernambuco, Informat Ctr, Recife, PE, Brazil
[2] Univ Fed Rural Pernambuco, Dept Stat & Informat, Recife, PE, Brazil
[3] HP Labs, Ft Collins, CO 80528 USA
[4] HP Brazil, Porto Alegre, RS, Brazil
关键词
Graph-based model; Sentence simplification; Relation extraction; Inductive logic programming; WORDNET;
D O I
10.1016/j.csl.2016.01.003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The degree of similarity between sentences is assessed by sentence similarity methods. Sentence similarity methods play an important role in areas such as summarization, search, and categorization of texts, machine translation, etc. The current methods for assessing sentence similarity are based only on the similarity between the words in the sentences. Such methods either represent sentences as bag of words vectors or are restricted to the syntactic information of the sentences. Two important problems in language understanding are not addressed by such strategies: the word order and the meaning of the sentence as a whole. The new sentence similarity assessment measure presented here largely improves and refines a recently published method that takes into account the lexical, syntactic and semantic components of sentences. The new method was benchmarked using Li-McLean, showing that it outperforms the state of the art systems and achieves results comparable to the evaluation made by humans. Besides that, the method proposed was extensively tested using the SemEval 2012 sentence similarity test set and in the evaluation of the degree of similarity between summaries using the CNN-corpus. In both cases, the measure proposed here was proved effective and useful. (C) 2016 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1 / 28
页数:28
相关论文
共 73 条
[1]  
[Anonymous], 2007, Technical Report
[2]  
[Anonymous], 2012, Proceedings of the First Joint Conference on Lexical and Computational Semantics
[3]  
[Anonymous], 2004, P INT C COMP LING
[4]  
[Anonymous], 2014, Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)
[5]  
[Anonymous], 2012, P 6 INT WORKSHOP SEM
[6]  
[Anonymous], 2014, P 2014 ACMSYMPOSIUM, DOI DOI 10.1145/2644866.2644881
[7]  
[Anonymous], 2012, SEM 2012 1 JOINT C L
[8]  
[Anonymous], 2000, Data Mining: Practical Machine Learning Tools with Java Implementations
[9]   Rhetorics-based multi-document summarization [J].
Atkinson, John ;
Munoz, Ricardo .
EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (11) :4346-4352
[10]  
Bhagwani S., 2012, SEM 2012, P579