INCOHERENT TRAINING OF DEEP NEURAL NETWORKS TO DE-CORRELATE BOTTLENECK FEATURES FOR SPEECH RECOGNITION

被引:0
|
作者
Bao, Yebo [1 ]
Jiang, Hui [2 ]
Dai, Lirong [1 ]
Liu, Cong [3 ]
机构
[1] Univ Sci & Technol China, Dept Elect Engn & Informat Sci, Hefei 230026, Anhui, Peoples R China
[2] Univ York, Dept Comp Sci & Engn, York YO10 5DD, N Yorkshire, England
[3] Anhui USTC iFlytek Co Ltd, iFlytek Res, Hefei, Peoples R China
关键词
Deep neural networks (DNN); nonlinear dimensionality reduction; bottleneck features; incoherent training; large vocabulary continuous speech recognition (LVCSR);
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recently, the hybrid model combining deep neural network (DNN) with context-dependent HMMs has achieved some dramatic gains over the conventional GMM/HMM method in many speech recognition tasks. In this paper, we study how to compete with the state-of-the-art DNN/HMM method under the traditional GMM/HMM framework. Instead of using DNN as acoustic model, we use DNN as a front-end bottleneck (BN) feature extraction method to de-correlate long feature vectors concatenated from several consecutive speech frames. More importantly, we have proposed two novel incoherent training methods to explicitly de-correlate BN features in learning of DNN. The first method relies on minimizing coherence of weight matrices in DNN while the second one attempts to minimize correlation coefficients of BN features calculated in each mini-batch data in DNN training. Experimental results on a 70-hr Mandarin transcription task and the 309-hr Switchboard task have shown that the traditional GMM/HMMs using BN features can yield comparable performance as DNN/HMM. The proposed incoherent training can produce 2-3% additional gain over the baseline BN features. At last, the discriminatively trained GMM/HMMs using incoherently trained BN features have consistently surpassed the state-of-the-art DNN/HMMs in all evaluated tasks.
引用
收藏
页码:6980 / 6984
页数:5
相关论文
共 50 条
  • [21] Deep Neural Networks in Russian Speech Recognition
    Markovnikov, Nikita
    Kipyatkova, Irina
    Karpov, Alexey
    Filchenkov, Andrey
    ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE, 2018, 789 : 54 - 67
  • [22] DEEP MAXOUT NEURAL NETWORKS FOR SPEECH RECOGNITION
    Cai, Meng
    Shi, Yongzhe
    Liu, Jia
    2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 291 - 296
  • [23] Deep Segmental Neural Networks for Speech Recognition
    Abdel-Hamid, Ossama
    Deng, Li
    Yu, Dong
    Jiang, Hui
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1848 - 1852
  • [24] SPEECH RECOGNITION WITH DEEP RECURRENT NEURAL NETWORKS
    Graves, Alex
    Mohamed, Abdel-rahman
    Hinton, Geoffrey
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6645 - 6649
  • [25] Binary Deep Neural Networks for Speech Recognition
    Xiang, Xu
    Qian, Yanmin
    Yu, Kai
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 533 - 537
  • [26] LEARNING FEATURE MAPPING USING DEEP NEURAL NETWORK BOTTLENECK FEATURES FOR DISTANT LARGE VOCABULARY SPEECH RECOGNITION
    Himawan, Ivan
    Motlicek, Petr
    Imseng, David
    Potard, Blaise
    Kim, Namhoon
    Lee, Jaewon
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4540 - 4544
  • [27] Automatic Speech Recognition with Deep Neural Networks for Impaired Speech
    Espana-Bonet, Cristina
    Fonollosa, Jose A. R.
    ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2016, 2016, 10077 : 97 - 107
  • [28] Robust Speech Recognition with Speech Enhanced Deep Neural Networks
    Du, Jun
    Wang, Qing
    Gao, Tian
    Xu, Yong
    Dai, Lirong
    Lee, Chin-Hui
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 616 - 620
  • [29] Emotion recognition from speech using deep recurrent neural networks with acoustic features
    Byun, Sung-Woo
    Shin, Bo-Ra
    Lee, Seok-Pil
    Han, Hyuk-Soo
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2018, 123 : 43 - 44
  • [30] Training Maxout Neural Networks for Speech Recognition Tasks
    Prudnikov, Aleksey
    Korenevsky, Maxim
    TEXT, SPEECH, AND DIALOGUE, 2016, 9924 : 443 - 451