DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE

被引:0
|
作者
Xue, Shaofei [1 ]
Abdel-Hamid, Ossama [2 ]
Jiang, Hui [2 ]
Dai, Lirong [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei 230026, Peoples R China
[2] Univ York, Dept Elect & Comp Engn, Toronto, ON, Canada
来源
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2014年
关键词
Deep Neural Network (DNN); Hybrid DNN-HMM; Speaker Code; Fast Speaker Adaptation; TRANSFORMATIONS;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recently an effective fast speaker adaptation method using discriminative speaker code (SC) has been proposed for the hybrid DNN-HMM models in speech recognition [1]. This adaptation method depends on a joint learning of a large generic adaptation neural network for all speakers as well as multiple small speaker codes using the standard back-propagation algorithm. In this paper, we propose an alternative direct adaptation in model space, where speaker codes are directly connected to the original DNN models through a set of new connection weights, which can be estimated very efficiently from all or part of training data. As a result, the proposed method is more suitable for large scale speech recognition tasks since it eliminates the time-consuming training process to estimate another adaptation neural networks. In this work, we have evaluated the proposed direct SC-based adaptation method in the large scale 320-hr Switchboard task. Experimental results have shown that the proposed SC-based rapid adaptation method is very effective not only for small recognition tasks but also for very large scale tasks. For example, it has shown that the proposed method leads to up to 8% relative reduction in word error rate in Switchboard by using only a very small number of adaptation utterances per speaker (from 10 to a few dozens). Moreover, the extra training time required for adaptation is also significantly reduced from the method in [1].
引用
收藏
页数:5
相关论文
共 50 条
  • [1] FAST SPEAKER ADAPTATION OF HYBRID NN/HMM MODEL FOR SPEECH RECOGNITION BASED ON DISCRIMINATIVE LEARNING OF SPEAKER CODE
    Abdel-Hamid, Ossama
    Jiang, Hui
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7942 - 7946
  • [2] Unsupervised Speaker Adaptation of BLSTM-RNN for LVCSR Based on Speaker Code
    Huang, Zhiying
    Xue, Shaofei
    Yan, Zhijie
    Dai, Lirong
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [3] Speaker adaptation technique for HMM model
    Kwong, S
    He, QH
    Man, KF
    Tang, KS
    ELECTRONICS LETTERS, 1999, 35 (21) : 1817 - 1818
  • [4] ONLINE SPEAKER ADAPTATION FOR LVCSR BASED ON ATTENTION MECHANISM
    Pan, Jia
    Liu, Diyuan
    Wan, Genshun
    Du, Jun
    Liu, Qingfeng
    Ye, Zhongfu
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 183 - 186
  • [5] Empirical Evaluation of Speaker Adaptation on DNN based Acoustic Model
    Wang, Ke
    Zhang, Junbo
    Wang, Yujun
    Xie, Lei
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2429 - 2433
  • [6] Speaker Adaptation Using Speaker Similarity Score on DNN Features
    Rizwan, Muhammad
    Anderson, David V.
    2015 IEEE 14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2015, : 877 - 882
  • [7] SPEAKER ADAPTATION OF CONTINUOUS PARAMETER HMM
    1600, (The International Society for Computers and Their Applications (ISCA)):
  • [8] Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition
    Xue, Shaofei
    Jiang, Hui
    Dai, Lirong
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 1 - +
  • [9] Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition
    Shaofei Xue
    Hui Jiang
    Lirong Dai
    Qingfeng Liu
    Journal of Signal Processing Systems, 2016, 82 : 175 - 185
  • [10] Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition
    Xue, Shaofei
    Jiang, Hui
    Dai, Lirong
    Liu, Qingfeng
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2016, 82 (02): : 175 - 185