Using Part-of-Speech Tags as Deep-Syntax Indicators in Determining Short-Text Semantic Similarity

被引：18

作者：

Batanovic, Vuk ^{[1
]}

Bojic, Dragan ^{[1
]}

机构：

[1] Sch Elect Engn, Belgrade 11120, Serbia

来源：

COMPUTER SCIENCE AND INFORMATION SYSTEMS | 2015年 / 12卷 / 01期

关键词：

short-text semantic similarity; statistical similarity; corpus-based measures; part-of-speech tags; POS weighting; syntactic information; bag-of-words model; natural language processing;

D O I：

10.2298/CSIS131127082B

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper presents POST STSS, a method of determining short-text semantic similarity in which part-of-speech tags are used as indicators of the deeper syntactic information usually extracted by more advanced tools like parsers and semantic role labelers. Our model employs a part-of-speech weighting scheme and is based on a statistical bag-of-words approach. It does not require either hand-crafted knowledge bases or advanced syntactic tools, which makes it easily applicable to languages with limited natural language processing resources. By using a paraphrase recognition test, we demonstrate that our system achieves a higher accuracy than all existing statistical similarity algorithms and solutions of a more structural kind.

引用

页码：1 / 31

页数：31

共 36 条

[1]

Achananuparp P, 2009, LECT NOTES ARTIF INT, V5476, P548, DOI 10.1007/978-3-642-01307-2_52

[2]

[Anonymous], IMPROVED MODEL UNPUB

[3]

[Anonymous], 2006, AAAI

[4]

[Anonymous], 2006, P AUSTRALASIAN LANGU

[5]

Balvet A, 2014, LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P4105

[6] Sentence fusion for multidocument news summarization [J].

Barzilay, R ;

McKeown, KR .

COMPUTATIONAL LINGUISTICS, 2005, 31 (03) :297-327

[7] Probabilistic Topic Models [J].

Blei, David M. .

COMMUNICATIONS OF THE ACM, 2012, 55 (04) :77-84

[8]

Bond F., 2012, P 6 GLOB WORDNET C G, P64

[9]

Dolan Bill, 2004, PROC INT C COMPUT LI, P350

[10]

Fernando S., 2008, P 11 ANN RES C UK SP, P45

← 1 2 3 4 →