INVESTIGATION OF SAMPLING TECHNIQUES FOR MAXIMUM ENTROPY LANGUAGE MODELING TRAINING

被引:0
作者
Chen, Xie [1 ]
Zhang, Jun [1 ]
Anastasakos, Tasos [1 ]
Alleva, Fil [1 ]
机构
[1] Microsoft, Redmond, WA 98052 USA
来源
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年
关键词
Maximum entropy language model; importance sampling; noise contrastive estimation; sampled softmax;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Maximum entropy language models (MaxEnt LMs) are log-linear models which are able to incorporate various hand-crafted features and non-linguistic information. Standard MaxEnt LMs are computationally heavy for tasks with a large vocabulary size due to the expensive normalization computation in the denominator. To address this issue, most recent works on MaxEnt LMs have used class based MaxEnt LMs. However, the performance of class based MaxEnt LMs might be sensitive to word clustering and it is also time-consuming to generate high-quality word classes. Motivated by the recent success of sampling techniques in accelerating the training of neural network language models, in this paper, three widely used sampling techniques, importance sampling, noise contrastive estimation (NCE) and sampled softmax, are investigated for the MaxEnt LM training. Experimental results on the Google One Billion corpus and an internal speech recognition system demonstrate the effectiveness of sampled softmax and NCE on MaxEnt LM training. However, importance sampling is not effective for MaxEnt LM training despite its similarity to sampled softmax. To our knowledge, this is the first work applying sampling techniques on MaxEnt LM training.
引用
收藏
页码:7240 / 7244
页数:5
相关论文
共 24 条
[1]  
[Anonymous], NIPS LCCC WORKSH
[2]  
[Anonymous], 2015, ARXIV151204906
[3]   Adaptive importance sampling to accelerate training of a neural probabilistic language model [J].
Bengio, Yoshua ;
Senecal, Jean-Sebastien .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2008, 19 (04) :713-722
[4]   Effectively Building Tera Scale MaxEnt Language Models Incorporating Non-Linguistic Signals [J].
Biadsy, Fadi ;
Ghodsi, Mohammadreza ;
Caseiro, Diamantino .
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :2710-2714
[5]  
Biadsy Fadi, 2014, 15 ANN C INT SPEECH
[6]   SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets [J].
Chaiken, Ronnie ;
Jenkins, Bob ;
Larson, Per-Ake ;
Ramsey, Bill ;
Shakib, Darren ;
Weaver, Simon ;
Zhou, Jingren .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2008, 1 (02) :1265-1276
[7]  
Chelba C., 2013, ONE BILLION WORD BEN
[8]   Sparse Non-negative Matrix Language Modeling: Maximum Entropy Flexibility on the Cheap [J].
Chelba, Ciprian ;
Caseiro, Diamantino ;
Biadsy, Fadi .
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :2725-2729
[9]  
Chelba C, 2015, 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P8, DOI 10.1109/ASRU.2015.7404767
[10]  
Chen S. F., 2009, Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, P468