DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE

被引：0

作者：

Xue, Shaofei ^{[1
]}

Abdel-Hamid, Ossama ^{[2
]}

Jiang, Hui ^{[2
]}

Dai, Lirong ^{[1
]}

机构：

[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei 230026, Peoples R China

[2] Univ York, Dept Elect & Comp Engn, Toronto, ON, Canada

来源：

2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2014年

关键词：

Deep Neural Network (DNN); Hybrid DNN-HMM; Speaker Code; Fast Speaker Adaptation; TRANSFORMATIONS;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Recently an effective fast speaker adaptation method using discriminative speaker code (SC) has been proposed for the hybrid DNN-HMM models in speech recognition [1]. This adaptation method depends on a joint learning of a large generic adaptation neural network for all speakers as well as multiple small speaker codes using the standard back-propagation algorithm. In this paper, we propose an alternative direct adaptation in model space, where speaker codes are directly connected to the original DNN models through a set of new connection weights, which can be estimated very efficiently from all or part of training data. As a result, the proposed method is more suitable for large scale speech recognition tasks since it eliminates the time-consuming training process to estimate another adaptation neural networks. In this work, we have evaluated the proposed direct SC-based adaptation method in the large scale 320-hr Switchboard task. Experimental results have shown that the proposed SC-based rapid adaptation method is very effective not only for small recognition tasks but also for very large scale tasks. For example, it has shown that the proposed method leads to up to 8% relative reduction in word error rate in Switchboard by using only a very small number of adaptation utterances per speaker (from 10 to a few dozens). Moreover, the extra training time required for adaptation is also significantly reduced from the method in [1].

引用

页数：5

共 50 条

[1] FAST SPEAKER ADAPTATION OF HYBRID NN/HMM MODEL FOR SPEECH RECOGNITION BASED ON DISCRIMINATIVE LEARNING OF SPEAKER CODE
Abdel-Hamid, Ossama
Jiang, Hui
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7942 - 7946
[2] Unsupervised Speaker Adaptation of BLSTM-RNN for LVCSR Based on Speaker Code
Huang, Zhiying
Xue, Shaofei
Yan, Zhijie
Dai, Lirong
2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
[3] Speaker adaptation technique for HMM model
Kwong, S
He, QH
Man, KF
Tang, KS
ELECTRONICS LETTERS, 1999, 35 (21) : 1817 - 1818
[4] ONLINE SPEAKER ADAPTATION FOR LVCSR BASED ON ATTENTION MECHANISM
Pan, Jia
Liu, Diyuan
Wan, Genshun
Du, Jun
Liu, Qingfeng
Ye, Zhongfu
2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 183 - 186
[5] Empirical Evaluation of Speaker Adaptation on DNN based Acoustic Model
Wang, Ke
Zhang, Junbo
Wang, Yujun
Xie, Lei
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2429 - 2433
[6] Speaker Adaptation Using Speaker Similarity Score on DNN Features
Rizwan, Muhammad
Anderson, David V.
2015 IEEE 14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2015, : 877 - 882
[7] SPEAKER ADAPTATION OF CONTINUOUS PARAMETER HMM
1600, (The International Society for Computers and Their Applications (ISCA)):
[8] Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition
Xue, Shaofei
Jiang, Hui
Dai, Lirong
2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 1 - +
[9] Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition
Shaofei Xue
Hui Jiang
Lirong Dai
Qingfeng Liu
Journal of Signal Processing Systems, 2016, 82 : 175 - 185
[10] Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition
Xue, Shaofei
Jiang, Hui
Dai, Lirong
Liu, Qingfeng
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2016, 82 (02): : 175 - 185

← 1 2 3 4 5 →