Global context-dependent recurrent neural network language model with sparse feature learning

被引:10
作者
Deng, Hongli [1 ,2 ]
Zhang, Lei [1 ]
Wang, Lituan [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Sichuan, Peoples R China
[2] China West Normal Univ, Educ & Informat Technol Ctr, Nanchong 637002, Peoples R China
基金
中国国家自然科学基金;
关键词
Recurrent neural network; Language model; Global context; Sparse feature; Deep learning;
D O I
10.1007/s00521-017-3065-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recurrent neural network language models (RNNLMs) are an important type of language model. In recent years, context-dependent RNNLMs are the most widely used ones as they apply additional information summarized from other sequences to access the larger context. However, when the sequences are mutually independent or randomly shuffled, these models cannot learn useful additional information, resulting in no larger context taken into account. In order to ensure that the model can obtain more contextual information in any case, a new language model is proposed in this paper. It can capture the global context just by the words within the current sequences, incorporating all the preceding and following words of target, without resorting to additional information summarized from other sequences. This model includes two main modules: a recurrent global context module used for extracting the global contextual information of the target and a sparse feature learning module that learns the sparse features of all the possible output words to distinguish the target word from others at the output layer. The proposed model was tested on three language modeling tasks. Experimental results show that it improves the perplexity of the model, speeds up the convergence of the network and learns better word embeddings compared with other language models.
引用
收藏
页码:999 / 1011
页数:13
相关论文
共 37 条
[1]  
[Anonymous], 2015, ARXIV150806615
[2]  
[Anonymous], 2008, Advances in neural information processing systems
[3]  
[Anonymous], COMPUTATIONAL LINGUI
[4]  
[Anonymous], 2015, ARXIV151002693
[5]  
[Anonymous], 2016, IEEE T CYBERN
[6]  
[Anonymous], ARXIV151103729
[7]  
[Anonymous], 2013, ARXIV13123005
[8]   LEARNING LONG-TERM DEPENDENCIES WITH GRADIENT DESCENT IS DIFFICULT [J].
BENGIO, Y ;
SIMARD, P ;
FRASCONI, P .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (02) :157-166
[9]  
Bengio Y, 2001, ADV NEUR IN, V13, P932
[10]  
Collobert R, 2011, J MACH LEARN RES, V12, P2493