Unsupervised Speaker Adaptation of BLSTM-RNN for LVCSR Based on Speaker Code

被引:0
作者
Huang, Zhiying [1 ]
Xue, Shaofei [2 ]
Yan, Zhijie [2 ]
Dai, Lirong [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Anhui, Peoples R China
[2] Alibaba Inc, Hefei, Anhui, Peoples R China
来源
2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2016年
关键词
BLSTM-RNN; LVCSR; speaker adaptation; speaker code; normalization; singular value decomposition; LINEAR-REGRESSION; TRANSFORMATIONS;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Recently, the speaker code based adaptation has been successfully expanded to recurrent neural networks using bidirectional Long Short-Term Memory (BLSTM-RNN) [1]. Experiments on the small-scale TIMIT task have demonstrated that the speaker code based adaptation is also valid for BLSTM-RNN. In this paper, we evaluate this method on large-scale task and introduce an error normalization method to balance the back-propagation errors derived from different layers for speaker codes. Meanwhile, we use singular value decomposition (SVD) method to conduct model compression. Results show that the speaker code based adaptation with SVD shows better recognition performance than the i-vector based speaker adaptation of the same dimension. Experimental results on Switchboard task show that the speaker code based adaptation on the hybrid BLSTM-DNN topology can achieve more than 9% relative reduction in word error rate (WER) compared to the speaker independent (SI) baseline.
引用
收藏
页数:5
相关论文
共 50 条
[21]   Offline to online speaker adaptation for real-time deep neural network based LVCSR systems [J].
Yanhua Long ;
Yijie Li ;
Bo Zhang .
Multimedia Tools and Applications, 2018, 77 :28101-28119
[22]   Offline to online speaker adaptation for real-time deep neural network based LVCSR systems [J].
Long, Yanhua ;
Li, Yijie ;
Zhang, Bo .
MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (21) :28101-28119
[23]   Techniques in rapid unsupervised speaker adaptation based on HMM-Sufficient Statistics [J].
Gomez, Randy ;
Toda, Tomoki ;
Saruwatari, Hiroshi ;
Shikano, Kiyohiro .
SPEECH COMMUNICATION, 2009, 51 (01) :42-57
[24]   Unsupervised speaker adaptation for robust speech recognition in real environments [J].
Yamade, S ;
Baba, A ;
Yoshikawa, S ;
Lee, A ;
Saruwatari, H ;
Shikano, K .
ELECTRONICS AND COMMUNICATIONS IN JAPAN PART II-ELECTRONICS, 2005, 88 (08) :30-41
[25]   Rapid Unsupervised Speaker Adaptation Robust in Reverberant Environment Conditions [J].
Gomez, Randy ;
Even, Jani ;
Shikano, Kiyohiro .
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, :1309-+
[26]   Speaker model selection based on the Bayesian information criterion applied to unsupervised speaker indexing [J].
Nishida, M ;
Kawahara, T .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (04) :583-592
[27]   Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems [J].
Deng, Jiajun ;
Xie, Xurong ;
Wang, Tianzi ;
Cui, Mingyu ;
Xue, Boyang ;
Jin, Zengrui ;
Li, Guinan ;
Hu, Shujie ;
Liu, Xunying .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 :1175-1190
[28]   Speaker adaptation based on regularized speaker-dependent eigenphone matrix estimation [J].
Zhang, Wen-Lin ;
Zhang, Wei-Qiang ;
Qu, Dan ;
Li, Bi-Cheng .
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2014,
[29]   Speaker adaptation based on regularized speaker-dependent eigenphone matrix estimation [J].
Wen-Lin Zhang ;
Wei-Qiang Zhang ;
Dan Qu ;
Bi-Cheng Li .
EURASIP Journal on Audio, Speech, and Music Processing, 2014
[30]   UBM based speaker selection and model re-estimation for speaker adaptation [J].
Wang, Jian ;
Guo, Jun ;
Liu, Gang ;
Lei, Jianjun .
PROCEEDINGS OF THE FIFTH IEEE INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS, VOLS 1 AND 2, 2006, :856-860