Bidirectional LSTM with attention mechanism and convolutional layer for text classification

被引:722
作者
Liu, Gang [1 ]
Guo, Jiabao [1 ]
机构
[1] Hubei Univ Technol, Sch Comp Sci, Wuhan 430072, Hubei, Peoples R China
基金
中国国家自然科学基金;
关键词
Long short-term memory; Attention mechanism; Natural language processing; Text classification; SHORT-TERM-MEMORY; NEURAL-NETWORKS;
D O I
10.1016/j.neucom.2019.01.078
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Neural network models have been widely used in the field of natural language processing (NLP). Recurrent neural networks (RNNs), which have the ability to process sequences of arbitrary length, are common methods for sequence modeling tasks. Long short-term memory (LSTM) is one kind of RNNs and has achieved remarkable performance in text classification. However, due to the high dimensionality and sparsity of text data, and to the complex semantics of the natural language, text classification presents difficult challenges. In order to solve the above problems, a novel and unified architecture which contains a bidirectional LSTM (BiLSTM), attention mechanism and the convolutional layer is proposed in this paper. The proposed architecture is called attention-based bidirectional long short-term memory with convolution layer (AC-BiLSTM). In AC-BiLSTM, the convolutional layer extracts the higher-level phrase representations from the word embedding vectors and BiLSTM is used to access both the preceding and succeeding context representations. Attention mechanism is employed to give different focus to the information out-putted from the hidden layers of BiLSTM. Finally, the softmax classifier is used to classify the processed context information. AC-BiLSTM is able to capture both the local feature of phrases as well as global sentence semantics. Experimental verifications are conducted on six sentiment classification datasets and a question classification dataset, including detailed analysis for AC-BiLSTM. The results clearly show that AC-BiLSTM outperforms other state-of-the-art text classification methods in terms of the classification accuracy. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:325 / 338
页数:14
相关论文
共 73 条
[31]  
Kim Y., 2014, P 2014 C EMP METH NA, P1746, DOI [10.3115/v1/D14-1181, DOI 10.3115/V1/D14-1181]
[32]  
King DB, 2015, ACS SYM SER, V1214, P1
[33]  
Lai SW, 2015, AAAI CONF ARTIF INTE, P2267
[34]  
Le Q., 2014, P 31 INT C MACH LEAR, V4, P2931
[35]   A multi-view recurrent neural network for 3D mesh segmentation [J].
Le, Truc ;
Bui, Giang ;
Duan, Ye .
COMPUTERS & GRAPHICS-UK, 2017, 66 :103-112
[36]   High Relevance Keyword Extraction facility for Bayesian text classification on different domains of varying characteristic [J].
Lee, Lam Hong ;
Isa, Dino ;
Choo, Wou Onn ;
Chue, Wen Yeen .
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (01) :1147-1155
[37]   A Context-aware Attention Network for Interactive Question Answering [J].
Li, Huayu ;
Min, Martin Renqiang ;
Ge, Yong ;
Kadav, Asim .
KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, :927-935
[38]  
Liu B., 2012, SENTIMENT ANAL OPINI, DOI DOI 10.2200/S00416ED1V01Y201204HLT016
[39]  
Liu B, 2015, Sentiment analysis, P11
[40]  
Liu Pengfei, 2016, IJCAI, P2873