Unsupervised Speaker Adaptation of BLSTM-RNN for LVCSR Based on Speaker Code

被引:0
作者
Huang, Zhiying [1 ]
Xue, Shaofei [2 ]
Yan, Zhijie [2 ]
Dai, Lirong [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Anhui, Peoples R China
[2] Alibaba Inc, Hefei, Anhui, Peoples R China
来源
2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2016年
关键词
BLSTM-RNN; LVCSR; speaker adaptation; speaker code; normalization; singular value decomposition; LINEAR-REGRESSION; TRANSFORMATIONS;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Recently, the speaker code based adaptation has been successfully expanded to recurrent neural networks using bidirectional Long Short-Term Memory (BLSTM-RNN) [1]. Experiments on the small-scale TIMIT task have demonstrated that the speaker code based adaptation is also valid for BLSTM-RNN. In this paper, we evaluate this method on large-scale task and introduce an error normalization method to balance the back-propagation errors derived from different layers for speaker codes. Meanwhile, we use singular value decomposition (SVD) method to conduct model compression. Results show that the speaker code based adaptation with SVD shows better recognition performance than the i-vector based speaker adaptation of the same dimension. Experimental results on Switchboard task show that the speaker code based adaptation on the hybrid BLSTM-DNN topology can achieve more than 9% relative reduction in word error rate (WER) compared to the speaker independent (SI) baseline.
引用
收藏
页数:5
相关论文
共 50 条
[41]   Unsupervised rapid speaker adaptation based on selective eigenvoice merging for user-specific voice interaction [J].
Choi, Dong-Jin ;
Park, Jeong-Sik ;
Oh, Yung-Hwan .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2015, 40 :95-102
[42]   Fuzzy Logic Based Control for VFS Speaker adaptation [J].
Ding, Ing-Jr .
INTERNATIONAL JOURNAL OF FUZZY SYSTEMS, 2008, 10 (04) :292-297
[43]   Speaker adaptation based on a maximum observation probability criterion [J].
Yang, TY ;
Lee, C ;
Youn, DH .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2001, E84D (02) :286-288
[44]   Eigenvoice based fast speaker adaptation with bias compensation [J].
Park, JS ;
Song, HJ ;
Kim, HS .
KORUS 2003: 7TH KOREA-RUSSIA INTERNATIONAL SYMPOSIUM ON SCIENCE AND TECHNOLOGY, VOL 2, PROCEEDINGS: ELECTRICAL ENGINEERING AND INFORMATION TECHNOLOGY, 2003, :108-112
[45]   SPEAKER VARIABILITY IN EMOTION RECOGNITION - AN ADAPTATION BASED APPROACH [J].
Ding, Ni ;
Sethu, Vidhyasaharan ;
Epps, Julien ;
Ambikairajah, Eliathamby .
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, :5101-5104
[46]   LINEAR NETWORKS BASED SPEAKER ADAPTATION FOR SPEECH SYNTHESIS [J].
Huang, Zhiying ;
Lu, Heng ;
Lei, Ming ;
Yan, Zhijie .
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, :5319-5323
[47]   Basis-Based Speaker Adaptation Using Partitioned HMM Mean Parameters of Training Speaker Models [J].
Yongwon Jeong .
Journal of Signal Processing Systems, 2016, 82 :303-310
[48]   Basis-Based Speaker Adaptation Using Partitioned HMM Mean Parameters of Training Speaker Models [J].
Jeong, Yongwon .
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2016, 82 (03) :303-310
[49]   Continuous speech recognition using an on-line speaker adaptation method based on automatic speaker clustering [J].
Zhang, W ;
Nakagawa, S .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2003, E86D (03) :464-473
[50]   A Study of Irrelevant Variability Normalization Based Training and Unsupervised Online Adaptation for LVCSR [J].
Shi, Guangchuan ;
Shi, Yu ;
Huo, Qiang .
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, :1357-1360