Hierarchical Pitman-Yor-Dirichlet Language Model

被引:30
作者
Chien, Jen-Tzung [1 ]
机构
[1] Natl Chiao Tung Univ, Dept Elect & Comp Engn, Hsinchu 30010, Taiwan
关键词
Bayesian nonparametrics; language model; speech recognition; topic model; unsupervised learning; POISSON-DIRICHLET;
D O I
10.1109/TASLP.2015.2428632
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Probabilistic models are often viewed as insufficiently expressive because of strong limitation and assumption on the probabilistic distribution and the fixed model complexity. Bayesian nonparametric learning pursues an expressive probabilistic representation based on the nonparametric prior and posterior distributions with less assumption-laden approach to inference. This paper presents a hierarchical Pitman-Yor-Dirichlet (HPYD) process as the nonparametric priors to infer the predictive probabilities of the smoothed n-grams with the integrated topic information. A metaphor of hierarchical Chinese restaurant process is proposed to infer the HPYD language model (HPYD-LM) via Gibbs sampling. This process is equivalent to implement the hierarchical Dirichlet process-latent Dirichlet allocation (HDP-LDA) with the twisted hierarchical Pitman-Yor LM (HPY-LM) as base measures. Accordingly, we produce the power-law distributions and extract the semantic topics to reflect the properties of natural language in the estimated HPYD-LM. The superiority of HPYD-LM to HPY-LM and other language models is demonstrated by the experiments on model perplexity and speech recognition.
引用
收藏
页码:1259 / 1272
页数:14
相关论文
共 35 条
[1]  
[Anonymous], 2005, International journal of computational linguistics & Chinese language processing
[2]  
[Anonymous], 1999, EUR C SPEECH COMMUN
[3]  
[Anonymous], P INT 07 ANTW BELG A
[4]   Exploiting latent semantic information in statistical language modeling [J].
Bellegarda, JR .
PROCEEDINGS OF THE IEEE, 2000, 88 (08) :1279-1296
[5]   A neural probabilistic language model [J].
Bengio, Y ;
Ducharme, R ;
Vincent, P ;
Jauvin, C .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (06) :1137-1155
[6]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[7]  
Brown P. F., 1992, Computational Linguistics, V18, P467
[8]   An empirical study of smoothing techniques for language modeling [J].
Chen, SF ;
Goodman, J .
COMPUTER SPEECH AND LANGUAGE, 1999, 13 (04) :359-394
[9]  
Chien J.-T., 2015, IEEE T NEUR IN PRESS
[10]  
Chien J.-T., 2013, P ANN C INT SPEECH C, P2212