A Initial Attempt on Task-Specific Adaptation for Deep Neural Network-based Large Vocabulary Continuous Speech Recognition

被引：0

作者：

Xiao, Yeming ^{[1
]}

Zhang, Zhen ^{[1
]}

Cai, Shang ^{[1
]}

Pan, Jielin ^{[1
]}

Yan, Yonghong ^{[1
]}

机构：

[1] Chinese Acad Sci, Key Lab Speech Acoust & Content Understanding, Beijing, Peoples R China

来源：

13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3 | 2012年

关键词：

deep neural network; pre-training; speaker adaptation; LVCSR;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In the state-of-the-art automatic speech recognition (ASR) systems, adaption techniques are used to the mitigate performance degradation caused by the mismatch in the training and testing procedure. Although there are bunch of adaption techniques for the hidden Markov models (HMM)-GMM-based system[3], there is rare work about the adaption in the hybrid artificial neural network (ANN)/HMM-based system [7][8]. Recently, there is a resurgence on ANN/HMM scheme for ASR with the success of context dependent deep neural network HMM (CD-DNN/HMM). Therefore in this paper, we present our initial efforts on the adaption techniques in the CD-DNN/HMM system. Specially, a linear input network(LIN)-based method and a neural network retraining(NNR)-based method is experimentally explored for the the task-adaptation purpose. Experiments on conversation telephone speech data set shows that these techniques can improve the system significantly and UN-based method seems to work better with medium mount of adaptation data.

引用

页码：2573 / 2576

页数：4

共 16 条

[1] [Anonymous], 2011 IEEE WORKSH AUT
[2] [Anonymous], ACOUSTICS SPEECH SIG
[3] [Anonymous], 1995 ABBOT LVCSR SYS
[4] [Anonymous], SIGNAL PROCESSING LE
[5] [Anonymous], EUROSPEECH
[6] [Anonymous], CONVERSATIONAL SPEEC
[7] [Anonymous], P 2001 INT JOINT C N
[8] [Anonymous], 1994, Connectionist Speech Recognition: A Hybrid Approach
[9] [Anonymous], AC SPEECH SIGN PROC
[10] [Anonymous], SPEECH AUDIO PROCESS

← 1 2 →