Gaussian Mixture Optimization Based on Efficient Cross-Validation

被引:6
作者
Shinozaki, Takahiro [1 ]
Furui, Sadaoki [1 ]
Kawahara, Tatsuya [2 ]
机构
[1] Tokyo Inst Technol, Dept Comp Sci, Tokyo 1528552, Japan
[2] Kyoto Univ, Acad Ctr Comp & Media Studies, Kyoto 6068501, Japan
关键词
Cross-validation; Gaussian mixture; hidden Markov model (HMM); speech recognition; sufficient statistics;
D O I
10.1109/JSTSP.2010.2048235
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
A Gaussian mixture optimization method is developed by using the cross-validation (CV) likelihood as an objective function instead of the conventional training set likelihood. The optimization is based on reducing the number of mixture components by selecting and merging pairs of Gaussians step by step according to the objective function so as to remove redundant components and improve the generality of the model. The CV likelihood is more effective for avoiding over-fitting than is the conventional likelihood, and it provides a termination criterion that does not rely on empirical thresholds. While the idea is simple, one problem is its infeasible computational cost. To make such optimization practical, an efficient evaluation algorithm using sufficient statistics is proposed. In addition, aggregated CV (AgCV) is developed to further improve the generalization performance of CV. Large-vocabulary speech recognition experiments on oral presentations show that the proposed methods improve speech recognition performance with automatically determined model complexity. The AgCV-based optimization is computationally more expensive than the CV-based method but gives better recognition performance.
引用
收藏
页码:540 / 547
页数:8
相关论文
共 30 条
[1]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[2]  
ANGUERA X, 2007, P ICASSP 2007, V4, P273
[3]  
[Anonymous], P SSPR 2003
[4]  
Bahl L., 1986, INT C ACOUSTICS SPEE, P49
[5]  
Breiman L, 1996, MACH LEARN, V24, P123, DOI 10.1023/A:1018054314350
[6]   Utterance-based selective training for the automatic creation of task-dependent acoustic models [J].
Cincarek, T ;
Toda, T ;
Saruwatari, H ;
Shikano, K .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (03) :962-969
[7]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[8]  
DEVIJVER PA, 1982, PATTERN RECOGNITION
[9]  
GILLICK L, P ICASSP, V89, P532
[10]  
HASHIMOTO K, 2008, P INT, P936