INCOHERENT TRAINING OF DEEP NEURAL NETWORKS TO DE-CORRELATE BOTTLENECK FEATURES FOR SPEECH RECOGNITION

被引:0
|
作者
Bao, Yebo [1 ]
Jiang, Hui [2 ]
Dai, Lirong [1 ]
Liu, Cong [3 ]
机构
[1] Univ Sci & Technol China, Dept Elect Engn & Informat Sci, Hefei 230026, Anhui, Peoples R China
[2] Univ York, Dept Comp Sci & Engn, York YO10 5DD, N Yorkshire, England
[3] Anhui USTC iFlytek Co Ltd, iFlytek Res, Hefei, Peoples R China
关键词
Deep neural networks (DNN); nonlinear dimensionality reduction; bottleneck features; incoherent training; large vocabulary continuous speech recognition (LVCSR);
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recently, the hybrid model combining deep neural network (DNN) with context-dependent HMMs has achieved some dramatic gains over the conventional GMM/HMM method in many speech recognition tasks. In this paper, we study how to compete with the state-of-the-art DNN/HMM method under the traditional GMM/HMM framework. Instead of using DNN as acoustic model, we use DNN as a front-end bottleneck (BN) feature extraction method to de-correlate long feature vectors concatenated from several consecutive speech frames. More importantly, we have proposed two novel incoherent training methods to explicitly de-correlate BN features in learning of DNN. The first method relies on minimizing coherence of weight matrices in DNN while the second one attempts to minimize correlation coefficients of BN features calculated in each mini-batch data in DNN training. Experimental results on a 70-hr Mandarin transcription task and the 309-hr Switchboard task have shown that the traditional GMM/HMMs using BN features can yield comparable performance as DNN/HMM. The proposed incoherent training can produce 2-3% additional gain over the baseline BN features. At last, the discriminatively trained GMM/HMMs using incoherently trained BN features have consistently surpassed the state-of-the-art DNN/HMMs in all evaluated tasks.
引用
收藏
页码:6980 / 6984
页数:5
相关论文
共 50 条
  • [1] Efficient deep neural networks for speech synthesis using bottleneck features
    Joo, Young-Sun
    Jun, Won-Suk
    Kang, Hong-Goo
    2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [2] Noisy training for deep neural networks in speech recognition
    Yin, Shi
    Liu, Chao
    Zhang, Zhiyong
    Lin, Yiye
    Wang, Dong
    Tejedor, Javier
    Zheng, Thomas Fang
    Li, Yinguo
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2015, : 1 - 14
  • [3] FAST TRAINING OF DEEP NEURAL NETWORKS FOR SPEECH RECOGNITION
    Gong, Guojing
    Kingsbury, Brian
    Yang, Chih-Chieh
    Liu, Tianyi
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6884 - 6888
  • [4] Noisy training for deep neural networks in speech recognition
    Shi Yin
    Chao Liu
    Zhiyong Zhang
    Yiye Lin
    Dong Wang
    Javier Tejedor
    Thomas Fang Zheng
    Yinguo Li
    EURASIP Journal on Audio, Speech, and Music Processing, 2015
  • [5] Speech Bandwidth Extension Using Bottleneck Features and Deep Recurrent Neural Networks
    Gu, Yu
    Ling, Zhen-Hua
    Dai, Li-Rong
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 297 - 301
  • [6] DEEP COMPLEMENTARY BOTTLENECK FEATURES FOR VISUAL SPEECH RECOGNITION
    Petridis, Stavros
    Pantic, Maja
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 2304 - 2308
  • [7] EXTRACTING DEEP BOTTLENECK FEATURES FOR VISUAL SPEECH RECOGNITION
    Sui, Chao
    Togneri, Roberto
    Bennamoun, Mohammed
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 1518 - 1522
  • [8] ARTICULATORY FEATURES FROM DEEP NEURAL NETWORKS AND THEIR ROLE IN SPEECH RECOGNITION
    Mitra, Vikramjit
    Sivaraman, Ganesh
    Nam, Hosung
    Espy-Wilson, Carol
    Saltzman, Elliot
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [9] A Sequence Training Method for Deep Rectifier Neural Networks in Speech Recognition
    Grosz, Tamas
    Gosztolya, Gabor
    Toth, Laszlo
    SPEECH AND COMPUTER, 2014, 8773 : 81 - 88
  • [10] An Investigation of Deep Neural Networks for Multilingual Speech Recognition Training and Adaptation
    Tong, Sibo
    Garner, Philip N.
    Bourlard, Herve
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 714 - 718