Noisy training for deep neural networks in speech recognition

被引:77
|
作者
Yin, Shi [1 ,4 ]
Liu, Chao [1 ,3 ]
Zhang, Zhiyong [1 ,2 ]
Lin, Yiye [1 ,5 ]
Wang, Dong [1 ,2 ]
Tejedor, Javier [6 ]
Zheng, Thomas Fang [1 ,2 ]
Li, Yinguo [4 ]
机构
[1] Tsinghua Univ, Res Inst Informat Technol, Ctr Speech & Language Technol, Beijing 100084, Peoples R China
[2] Tsinghua Natl Lab Informat Sci & Technol, Ctr Speech & Language Technol, Div Tech Innovat & Dev, Beijing 100084, Peoples R China
[3] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[4] Chongqing Univ Posts & Telecommun, Sch Comp Sci & Technol, Chongqing 400065, Peoples R China
[5] Beijing Inst Technol, Beijing 100081, Peoples R China
[6] Univ Alcala, GEINTRA, Madrid, Spain
基金
美国国家科学基金会;
关键词
Speech recognition; Deep neural network; Noise injection; INJECTION; INPUTS;
D O I
10.1186/s13636-014-0047-0
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep neural networks (DNNs) have gained remarkable success in speech recognition, partially attributed to the flexibility of DNN models in learning complex patterns of speech signals. This flexibility, however, may lead to serious over-fitting and hence miserable performance degradation in adverse acoustic conditions such as those with high ambient noises. We propose a noisy training approach to tackle this problem: by injecting moderate noises into the training data intentionally and randomly, more generalizable DNN models can be learned. This 'noise injection' technique, although known to the neural computation community already, has not been studied with DNNs which involve a highly complex objective function. The experiments presented in this paper confirm that the noisy training approach works well for the DNN model and can provide substantial performance improvement for DNN-based speech recognition.
引用
收藏
页码:1 / 14
页数:14
相关论文
共 50 条
  • [21] Training Maxout Neural Networks for Speech Recognition Tasks
    Prudnikov, Aleksey
    Korenevsky, Maxim
    TEXT, SPEECH, AND DIALOGUE, 2016, 9924 : 443 - 451
  • [22] Speech emotion recognition with deep convolutional neural networks
    Issa, Dias
    Demirci, M. Fatih
    Yazici, Adnan
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2020, 59
  • [23] RECURRENT DEEP NEURAL NETWORKS FOR ROBUST SPEECH RECOGNITION
    Weng, Chao
    Yu, Dong
    Watanabe, Shinji
    Juang, Biing-Hwang
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [24] Emotional Speech Recognition Using Deep Neural Networks
    Trinh Van, Loan
    Dao Thi Le, Thuy
    Le Xuan, Thanh
    Castelli, Eric
    SENSORS, 2022, 22 (04)
  • [25] A NETWORK OF DEEP NEURAL NETWORKS FOR DISTANT SPEECH RECOGNITION
    Ravanelli, Mirco
    Brakel, Philemon
    Omologo, Maurizio
    Bengio, Yoshua
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4880 - 4884
  • [26] Deep Neural Networks for Acoustic Modeling in Speech Recognition
    Hinton, Geoffrey
    Deng, Li
    Yu, Dong
    Dahl, George E.
    Mohamed, Abdel-rahman
    Jaitly, Navdeep
    Senior, Andrew
    Vanhoucke, Vincent
    Patrick Nguyen
    Sainath, Tara N.
    Kingsbury, Brian
    IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 82 - 97
  • [27] INVESTIGATING SPARSE DEEP NEURAL NETWORKS FOR SPEECH RECOGNITION
    Pironkov, Gueorgui
    Dupont, Stephane
    Dutoit, Thierry
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 124 - 129
  • [28] Mongolian Speech Recognition Based on Deep Neural Networks
    Zhang, Hui
    Bao, Feilong
    Gao, Guanglai
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA (CCL 2015), 2015, 9427 : 180 - 188
  • [29] Deep Neural Networks for Speech Enhancement in Complex-Noisy Environments
    Saleem, Nasir
    Khattak, Muhammad Irfan
    INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2020, 6 (01): : 84 - 90
  • [30] On Deep and Shallow Neural Networks in Speech Recognition from Speech Spectrum
    Zelinka, Jan
    Salajka, Petr
    Mueller, Ludek
    SPEECH AND COMPUTER (SPECOM 2015), 2015, 9319 : 301 - 308