Unsupervised Cross-Adaptation Using Language Model and Deep Learning Based Acoustic Model Adaptations

被引:0
作者
Takagi, Akira [1 ]
Konno, Kazuki [1 ]
Kato, Masaharu [1 ]
Kosaka, Tetsuo [1 ]
机构
[1] Yamagata Univ, Grad Sch Sci & Engn, Yonezawa, Yamagata, Japan
来源
2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) | 2014年
基金
日本学术振兴会;
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
It is well known that deep learning-based speech recognition improves performance significantly. In deep learning based systems, the deep neural network hidden Markov model (DNN-HMM) is used as an acoustic model (AM). Recently, speaker adaptation techniques based on DNN-HMM have also been investigated. The aim of this work is to improve the performance of unsupervised batch adaptation using DNN-HMM. The proposed adaptation method is based on the cross-adaptation approach, where complementary information derived from several systems is used. Gaussian mixture model HMM (GMM-HMM), DNN-HMM, and language model (LM) adaptation processes are conducted sequentially in the cross-adaptation procedure. The proposed adaptation method was evaluated on a Japanese lecture speech recognition task, reducing the error rate by 13.5% compared to the baseline DNN-HMM-based large vocabulary continuous speech recognition system.
引用
收藏
页数:4
相关论文
共 15 条
[1]   Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition [J].
Dahl, George E. ;
Yu, Dong ;
Deng, Li ;
Acero, Alex .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01) :30-42
[2]   Analysis and recognition of spontaneous speech using Corpus of Spontaneous Japanese [J].
Furui, S ;
Nakamura, M ;
Ichiba, T ;
Iwano, K .
SPEECH COMMUNICATION, 2005, 47 (1-2) :208-219
[3]   Mean and variance adaptation within the MLLR framework [J].
Gales, MJF ;
Woodland, PC .
COMPUTER SPEECH AND LANGUAGE, 1996, 10 (04) :249-264
[4]  
Kosaka T., 2011, P APSIPA ASC 2011
[5]  
Kosaka T., 2010, P INTERSP 2010
[6]  
Liao H, 2013, INT CONF ACOUST SPEE, P7947, DOI 10.1109/ICASSP.2013.6639212
[7]  
Liu X, 2010, 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, P342
[8]  
Liu X., 2011, INTERSPEECH 2011, P2857
[9]  
MOORE G, 2000, P ICSLP, P512
[10]  
Ochiai T., 2014, P ICASP 2014