Noisy training for deep neural networks in speech recognition

被引:77
|
作者
Yin, Shi [1 ,4 ]
Liu, Chao [1 ,3 ]
Zhang, Zhiyong [1 ,2 ]
Lin, Yiye [1 ,5 ]
Wang, Dong [1 ,2 ]
Tejedor, Javier [6 ]
Zheng, Thomas Fang [1 ,2 ]
Li, Yinguo [4 ]
机构
[1] Tsinghua Univ, Res Inst Informat Technol, Ctr Speech & Language Technol, Beijing 100084, Peoples R China
[2] Tsinghua Natl Lab Informat Sci & Technol, Ctr Speech & Language Technol, Div Tech Innovat & Dev, Beijing 100084, Peoples R China
[3] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[4] Chongqing Univ Posts & Telecommun, Sch Comp Sci & Technol, Chongqing 400065, Peoples R China
[5] Beijing Inst Technol, Beijing 100081, Peoples R China
[6] Univ Alcala, GEINTRA, Madrid, Spain
基金
美国国家科学基金会;
关键词
Speech recognition; Deep neural network; Noise injection; INJECTION; INPUTS;
D O I
10.1186/s13636-014-0047-0
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep neural networks (DNNs) have gained remarkable success in speech recognition, partially attributed to the flexibility of DNN models in learning complex patterns of speech signals. This flexibility, however, may lead to serious over-fitting and hence miserable performance degradation in adverse acoustic conditions such as those with high ambient noises. We propose a noisy training approach to tackle this problem: by injecting moderate noises into the training data intentionally and randomly, more generalizable DNN models can be learned. This 'noise injection' technique, although known to the neural computation community already, has not been studied with DNNs which involve a highly complex objective function. The experiments presented in this paper confirm that the noisy training approach works well for the DNN model and can provide substantial performance improvement for DNN-based speech recognition.
引用
收藏
页码:1 / 14
页数:14
相关论文
共 50 条
  • [41] Acceleration Strategies for Speech Recognition based on Deep Neural Networks
    Tian, Chao
    Liu, Jia
    Peng, Zhaomeng
    MECHATRONICS ENGINEERING, COMPUTING AND INFORMATION TECHNOLOGY, 2014, 556-562 : 5181 - 5185
  • [42] Speech Recognition Using Deep Neural Networks: A Systematic Review
    Nassif, Ali Bou
    Shahin, Ismail
    Attili, Imtinan
    Azzeh, Mohammad
    Shaalan, Khaled
    IEEE ACCESS, 2019, 7 : 19143 - 19165
  • [43] AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH RECOGNITION
    Seltzer, Michael L.
    Yu, Dong
    Wang, Yongqiang
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7398 - 7402
  • [44] Training Neural Networks on Noisy Data
    Rusiecki, Andrzej
    Kordos, Miroslaw
    Kaminski, Tomasz
    Gren, Krzysztof
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING ICAISC 2014, PT I, 2014, 8467 : 131 - 142
  • [45] Speech Emotion Recognition using Convolution Neural Networks and Deep Stride Convolutional Neural Networks
    Wani, Taiba Majid
    Gunawan, Teddy Surya
    Qadri, Syed Asif Ahmad
    Mansor, Hasmah
    Kartiwi, Mira
    Ismail, Nanang
    PROCEEDING OF 2020 6TH INTERNATIONAL CONFERENCE ON WIRELESS AND TELEMATICS (ICWT), 2020,
  • [46] Hierarchical singleton-type recurrent neural fuzzy networks for noisy speech recognition
    Juang, Chia-Feng
    Chiou, Chyi-Tian
    Lai, Chun-Lung
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 2007, 18 (03): : 833 - 843
  • [47] Training Deep Neural Networks for Image Applications with Noisy Labels by Complementary Learning
    Zhou Y.
    Liu Y.
    Wang R.
    2017, Science Press (54): : 2649 - 2659
  • [48] Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels
    Zhang, Zhilu
    Sabuncu, Mert R.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [49] Robust Training of Deep Neural Networks with Noisy Labels by Graph Label Propagation
    Nomura, Yuichiro
    Kurita, Takio
    FRONTIERS OF COMPUTER VISION, IW-FCV 2021, 2021, 1405 : 281 - 293
  • [50] A discriminative and robust training algorithm for noisy speech recognition
    Hong, WT
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 8 - 11