UNSUPERVISED LANGUAGE MODEL ADAPTATION USING N-GRAM WEIGHTING

被引:0
作者
Haidar, Md. Akmal [1 ]
O'Shaughnessy, Douglas [1 ]
机构
[1] INRS EMT, Montreal, PQ H5A 1K6, Canada
来源
2011 24TH CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE) | 2011年
关键词
Mixture models; speech recognition; latent Dirichlet allocation; language model adaptation;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, we introduce the weighting of topic models in mixture language model adaptation using n-grams of the topic models. Topic clusters are formed by using a hard-clustering method assigning one topic to one document based on the maximum number of words chosen from a topic for that document in Latent Dirichlet Allocation (LDA) analysis. The n-grams of the topic generated by hard-clustering are used to compute the mixture weights of the component topic models. Instead of using all the words of the training vocabulary, selected words are used for LDA analysis, which are chosen by incorporating some information retrieval techniques. The proposed n-gram weighting approach shows significant reduction in perplexity and word error rate (WER) against a unigram weighting approach used in the literature.
引用
收藏
页码:857 / 860
页数:4
相关论文
共 16 条
[1]   Exploiting latent semantic information in statistical language modeling [J].
Bellegarda, JR .
PROCEEDINGS OF THE IEEE, 2000, 88 (08) :1279-1296
[2]   Statistical language model adaptation: review and perspectives [J].
Bellegarda, JR .
SPEECH COMMUNICATION, 2004, 42 (01) :93-108
[3]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[4]  
GILDEA DANIEL., 1999, Proceedings of EUROSPEECH 1999, Budapest: Technical University of Budapest, P2167
[5]   Finding scientific topics [J].
Griffiths, TL ;
Steyvers, M .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 :5228-5235
[6]  
Haidar M. A., 2010, P INTERSPEECH, P2438
[7]  
IYER R, 1996, P INT C SPOK LANG PR, V1, P236
[8]  
KNESER R, 1993, P INT C ACOUST SPEEC, V2, P586
[9]   A CACHE-BASED NATURAL-LANGUAGE MODEL FOR SPEECH RECOGNITION [J].
KUHN, R ;
DEMORI, R .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1990, 12 (06) :570-583
[10]  
Liu F., 2007, Proceedings of ACL, P672