Transformer-Based Neural Network for Answer Selection in Question Answering

被引：63

作者：

Shao, Taihua ^{[1
]}

Guo, Yupu ^{[1
]}

Chen, Honghui ^{[1
]}

Hao, Zepeng ^{[1
]}

机构：

[1] Natl Univ Def Technol, Sci & Technol Informat Syst Engn Lab, Changsha 410073, Hunan, Peoples R China

来源：

IEEE ACCESS | 2019年 / 7卷

基金：

中国国家自然科学基金;

关键词：

Answer selection; deep learning; question answering; Transformer;

D O I：

10.1109/ACCESS.2019.2900753

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Answer selection is a crucial subtask in the question answering (QA) system. Conventional avenues for this task mainly concentrate on developing linguistic tools that are limited in both performance and practicability. Answer selection approaches based on deep learning have been well investigated with the tremendous success of deep learning in natural language processing. However, the traditional neural networks employed in existing answer selection models, i.e., recursive neural network or convolutional neural network, typically suffer from obtaining the global text information due to their operating mechanisms. The recent Transformer neural network is considered to be good at extracting the global information by employing only self-attention mechanism. Thus, in this paper, we design a Transformer-based neural network for answer selection, where we deploy a bidirectional long short-term memory (BiLSTM) behind the Transformer to acquire both global information and sequential features in the question or answer sentence. Different from the original Transformer, our Transformer-based network focuses on sentence embedding rather than the seq2seq task. In addition, we employ a BiLSTM rather than utilizing the position encoding to incorporate sequential features as the universal Transformer does. Furthermore, we apply three aggregated strategies to generate sentence embeddings for question and answer, i.e., the weighted mean pooling, the max pooling, and the attentive pooling, leading to three corresponding Transformer-based models, i.e., QA-TFWP, QA-TFMP, and QA-TFAP, respectively. Finally, we evaluate our proposals on a popular QA dataset WikiQA. The experimental results demonstrate that our proposed Transformer-based answer selection models can produce a better performance compared with several competitive baselines. In detail, our best model outperforms the state-of-the-art baseline by up to 2.37%, 2.83%, and 3.79% in terms of MAP, MRR, and accuracy, respectively.

引用

页码：26146 / 26156

页数：11

共 32 条

[1]

[Anonymous], SURVEY COMMUNITY QUE

[2]

Cho K, 2014, SSST@eMNLP, DOI 10.3115/v1/w14-4012

[3]

dos Santos C., 2016, ATTENTIVE POOLING NE

[4]

Feng MW, 2015, 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P813, DOI 10.1109/ASRU.2015.7404872

[5] Building Watson: An Overview of the DeepQA Project [J].

Ferrucci, David ;

Brown, Eric ;

Chu-Carroll, Jennifer ;

Fan, James ;

Gondek, David ;

Kalyanpur, Aditya A. ;

Lally, Adam ;

Murdock, J. William ;

Nyberg, Eric ;

Prager, John ;

Schlaefer, Nico ;

Welty, Chris .

AI MAGAZINE, 2010, 31 (03) :59-79

[6] Greedy function approximation: A gradient boosting machine [J].

Friedman, JH .

ANNALS OF STATISTICS, 2001, 29 (05) :1189-1232

[7]

Heilman M, 2010, HUMAN LANGUAGE TECHN, P1011, DOI DOI 10.18653/V1/N22-4401

[8]

Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.8.1735, 10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]

[9]

Huang Zhiheng, 2015, CoRR

[10]

Kingma D.P., 2014, Adam: A Method for Stochastic Optimization,

← 1 2 3 4 →