Discriminalve cluster adaptive training

被引:17
作者
Yu, Kai [1 ]
Gales, Mark J. F.
机构
[1] Univ Cambridge, Engn Dept, Cambridge CB2 1PZ, England
[2] Univ Cambridge, Engn Dept, Cambridge CB2 1PZ, England
[3] Emmanuel Coll, Cambridge CB2 1PZ, England
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2006年 / 14卷 / 05期
关键词
cluster adaptive training (CAT); discriminative training; eigenvoices; minimum phone error (MPE); multiple-cluster HMM;
D O I
10.1109/TSA.2005.858555
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Multiple-cluster schemes, such as cluster adaptive training (CAT) or eigenvoice systems, are a popular approach for rapid speaker, and environment adaptation. Interpolation I weights are used to transform a multiple-cluster, canonical, mode (HMM) set representative to a standard hidden Markov model e, of an individual speaker or acoustic environment. Maximum likelihood training for CAT has previously been investigated. However, in state-of-the-art large vocabulary continuous. speech ' recognition systems, discriminative training is commonly employed. This paper investigates applying discriminative training to multiple-cluster systems. In particular,. minimum phone error (MPE) update formulae for CAT systems are derived. In order to use MPE in this case, modifications to the standard MPE smoothing function and the prior. distribution associated with MPE training are required. A more complex adaptive training near trans scheme combining both interpolation weights and in forms, a structured transform (ST), is also discussed within the MPE training framework. Discriminatively trained CAT and ST systems were evaluated on a state-of-the-art conversational telephone speech task. These Multiple-cluster systems ,were found to outperform both standard and adaptively trained systems.
引用
收藏
页码:1694 / 1703
页数:10
相关论文
共 27 条
[1]  
Anastasakos T, 1996, ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, P1137, DOI 10.1109/ICSLP.1996.607807
[2]  
[Anonymous], P INT C SPOK LANG PR
[3]  
BAHL L, 1986, P INT C AC SPEECH SI, V1, P49, DOI DOI 10.1109/ICASSP.1986.1169179>
[4]  
BOTTERWECK H, 2000, P INT C SPOK LANG, P354
[5]   Maximum likelihood linear transformations for HMM-based speech recognition [J].
Gales, MJF .
COMPUTER SPEECH AND LANGUAGE, 1998, 12 (02) :75-98
[6]   Cluster adaptive training of hidden Markov models [J].
Gales, MJF .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (04) :417-428
[7]  
GALES MJF, 2001, ASRU C
[8]  
GALES MJF, 1998, P ICSLP, P1783
[9]  
GALES MJF, 2001, INT C AC SPEECH SIGN
[10]   Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains [J].
Gauvain, Jean-Luc ;
Lee, Chin-Hui .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (02) :291-298