Using Part-of-Speech Tags as Deep-Syntax Indicators in Determining Short-Text Semantic Similarity

被引:18
作者
Batanovic, Vuk [1 ]
Bojic, Dragan [1 ]
机构
[1] Sch Elect Engn, Belgrade 11120, Serbia
关键词
short-text semantic similarity; statistical similarity; corpus-based measures; part-of-speech tags; POS weighting; syntactic information; bag-of-words model; natural language processing;
D O I
10.2298/CSIS131127082B
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents POST STSS, a method of determining short-text semantic similarity in which part-of-speech tags are used as indicators of the deeper syntactic information usually extracted by more advanced tools like parsers and semantic role labelers. Our model employs a part-of-speech weighting scheme and is based on a statistical bag-of-words approach. It does not require either hand-crafted knowledge bases or advanced syntactic tools, which makes it easily applicable to languages with limited natural language processing resources. By using a paraphrase recognition test, we demonstrate that our system achieves a higher accuracy than all existing statistical similarity algorithms and solutions of a more structural kind.
引用
收藏
页码:1 / 31
页数:31
相关论文
共 36 条
[1]  
Achananuparp P, 2009, LECT NOTES ARTIF INT, V5476, P548, DOI 10.1007/978-3-642-01307-2_52
[2]  
[Anonymous], IMPROVED MODEL UNPUB
[3]  
[Anonymous], 2006, AAAI
[4]  
[Anonymous], 2006, P AUSTRALASIAN LANGU
[5]  
Balvet A, 2014, LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P4105
[6]   Sentence fusion for multidocument news summarization [J].
Barzilay, R ;
McKeown, KR .
COMPUTATIONAL LINGUISTICS, 2005, 31 (03) :297-327
[7]   Probabilistic Topic Models [J].
Blei, David M. .
COMMUNICATIONS OF THE ACM, 2012, 55 (04) :77-84
[8]  
Bond F., 2012, P 6 GLOB WORDNET C G, P64
[9]  
Dolan Bill, 2004, PROC INT C COMPUT LI, P350
[10]  
Fernando S., 2008, P 11 ANN RES C UK SP, P45