Noisy training for deep neural networks in speech recognition

被引:77
|
作者
Yin, Shi [1 ,4 ]
Liu, Chao [1 ,3 ]
Zhang, Zhiyong [1 ,2 ]
Lin, Yiye [1 ,5 ]
Wang, Dong [1 ,2 ]
Tejedor, Javier [6 ]
Zheng, Thomas Fang [1 ,2 ]
Li, Yinguo [4 ]
机构
[1] Tsinghua Univ, Res Inst Informat Technol, Ctr Speech & Language Technol, Beijing 100084, Peoples R China
[2] Tsinghua Natl Lab Informat Sci & Technol, Ctr Speech & Language Technol, Div Tech Innovat & Dev, Beijing 100084, Peoples R China
[3] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[4] Chongqing Univ Posts & Telecommun, Sch Comp Sci & Technol, Chongqing 400065, Peoples R China
[5] Beijing Inst Technol, Beijing 100081, Peoples R China
[6] Univ Alcala, GEINTRA, Madrid, Spain
基金
美国国家科学基金会;
关键词
Speech recognition; Deep neural network; Noise injection; INJECTION; INPUTS;
D O I
10.1186/s13636-014-0047-0
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep neural networks (DNNs) have gained remarkable success in speech recognition, partially attributed to the flexibility of DNN models in learning complex patterns of speech signals. This flexibility, however, may lead to serious over-fitting and hence miserable performance degradation in adverse acoustic conditions such as those with high ambient noises. We propose a noisy training approach to tackle this problem: by injecting moderate noises into the training data intentionally and randomly, more generalizable DNN models can be learned. This 'noise injection' technique, although known to the neural computation community already, has not been studied with DNNs which involve a highly complex objective function. The experiments presented in this paper confirm that the noisy training approach works well for the DNN model and can provide substantial performance improvement for DNN-based speech recognition.
引用
收藏
页码:1 / 14
页数:14
相关论文
共 50 条
  • [1] Noisy training for deep neural networks in speech recognition
    Shi Yin
    Chao Liu
    Zhiyong Zhang
    Yiye Lin
    Dong Wang
    Javier Tejedor
    Thomas Fang Zheng
    Yinguo Li
    EURASIP Journal on Audio, Speech, and Music Processing, 2015
  • [2] FAST TRAINING OF DEEP NEURAL NETWORKS FOR SPEECH RECOGNITION
    Gong, Guojing
    Kingsbury, Brian
    Yang, Chih-Chieh
    Liu, Tianyi
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6884 - 6888
  • [3] NOISY TRAINING FOR DEEP NEURAL NETWORKS
    Meng, Xiangtao
    Liu, Chao
    Zhang, Zhiyong
    Wang, Dong
    2014 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (CHINASIP), 2014, : 16 - 20
  • [4] Investigating Factor Analysis Features for Deep Neural Networks In Noisy Speech Recognition
    Ganapathy, Sriram
    Thomas, Samuel
    Dimitriadis, Dimitrios
    Rennie, Steven
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1898 - 1902
  • [5] SYNTHESIZED STEREO MAPPING VIA DEEP NEURAL NETWORKS FOR NOISY SPEECH RECOGNITION
    Du, Jun
    Dai, Li-Rong
    Huo, Qiang
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [6] An Analysis of Deep Neural Networks in Broad Phonetic Classes for Noisy Speech Recognition
    de-la-Calle-Silos, F.
    Gallardo-Antolin, A.
    Pelaez-Moreno, C.
    ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2016, 2016, 10077 : 87 - 96
  • [7] A Sequence Training Method for Deep Rectifier Neural Networks in Speech Recognition
    Grosz, Tamas
    Gosztolya, Gabor
    Toth, Laszlo
    SPEECH AND COMPUTER, 2014, 8773 : 81 - 88
  • [8] An Investigation of Deep Neural Networks for Multilingual Speech Recognition Training and Adaptation
    Tong, Sibo
    Garner, Philip N.
    Bourlard, Herve
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 714 - 718
  • [9] Speech recognition in noisy environments with Convolutional Neural Networks
    Santos, Rafael M.
    Matos, Leonardo N.
    Macedo, Hendrik T.
    Montalvao, Jugurta
    2015 BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2015), 2015, : 175 - 179
  • [10] Evaluating robust features on Deep Neural Networks for speech recognition in noisy and channel mismatched conditions
    Mitra, Vikramjit
    Wang, Wen
    Franco, Horacio
    Lei, Yun
    Bartels, Chris
    Graciarena, Martin
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 895 - 899