Unsupervised Speaker Adaptation of BLSTM-RNN for LVCSR Based on Speaker Code

被引:0
|
作者
Huang, Zhiying [1 ]
Xue, Shaofei [2 ]
Yan, Zhijie [2 ]
Dai, Lirong [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Anhui, Peoples R China
[2] Alibaba Inc, Hefei, Anhui, Peoples R China
来源
2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2016年
关键词
BLSTM-RNN; LVCSR; speaker adaptation; speaker code; normalization; singular value decomposition; LINEAR-REGRESSION; TRANSFORMATIONS;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Recently, the speaker code based adaptation has been successfully expanded to recurrent neural networks using bidirectional Long Short-Term Memory (BLSTM-RNN) [1]. Experiments on the small-scale TIMIT task have demonstrated that the speaker code based adaptation is also valid for BLSTM-RNN. In this paper, we evaluate this method on large-scale task and introduce an error normalization method to balance the back-propagation errors derived from different layers for speaker codes. Meanwhile, we use singular value decomposition (SVD) method to conduct model compression. Results show that the speaker code based adaptation with SVD shows better recognition performance than the i-vector based speaker adaptation of the same dimension. Experimental results on Switchboard task show that the speaker code based adaptation on the hybrid BLSTM-DNN topology can achieve more than 9% relative reduction in word error rate (WER) compared to the speaker independent (SI) baseline.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Rapid Speaker Adaptation Based on D-code Extracted from BLSTM-RNN in LVCSR
    Xue, Shaofei
    Yan, Zhijie
    Huang, Zhiying
    Dai, Lirong
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [2] SPEAKER ADAPTATION OF RNN-BLSTM FOR SPEECH RECOGNITION BASED ON SPEAKER CODE
    Huang, Zhiying
    Tang, Jian
    Xue, Shaofei
    Dai, Lirong
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5305 - 5309
  • [3] Speaker Representations for Speaker Adaptation in Multiple Speakers' BLSTM-RNN-based Speech Synthesis
    Zhao, Yi
    Saito, Daisuke
    Minematsu, Nobuaki
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2268 - 2272
  • [4] DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE
    Xue, Shaofei
    Abdel-Hamid, Ossama
    Jiang, Hui
    Dai, Lirong
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [5] ONLINE SPEAKER ADAPTATION FOR LVCSR BASED ON ATTENTION MECHANISM
    Pan, Jia
    Liu, Diyuan
    Wan, Genshun
    Du, Jun
    Liu, Qingfeng
    Ye, Zhongfu
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 183 - 186
  • [6] BLSTM-RNN Based 3D Gesture Classification
    Lefebvre, Gregoire
    Berlemont, Samuel
    Mamalet, Franck
    Garcia, Christophe
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2013, 2013, 8131 : 381 - 388
  • [7] Unsupervised speaker adaptation using reference speaker weighting
    Lai, Tsz-Chung
    Mak, Brian
    CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 380 - +
  • [8] Unsupervised Speaker Adaptation based on the Cosine Similarity for Text-Independent Speaker Verification
    Shum, Stephen
    Dehak, Najim
    Dehak, Reda
    Glass, James R.
    ODYSSEY 2010: THE SPEAKER AND LANGUAGE RECOGNITION WORKSHOP, 2010, : 76 - 82
  • [9] Rapid Unsupervised Speaker Adaptation Using Single Utterance Based on MLLR and Speaker Selection
    Gomez, Randy
    Toda, Tomoki
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1365 - 1368
  • [10] UNSUPERVISED SPEAKER ADAPTATION BASED ON HIERARCHICAL SPECTRAL CLUSTERING
    FURUI, S
    IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1989, 37 (12): : 1923 - 1930