Noisy training for deep neural networks in speech recognition

被引:77
|
作者
Yin, Shi [1 ,4 ]
Liu, Chao [1 ,3 ]
Zhang, Zhiyong [1 ,2 ]
Lin, Yiye [1 ,5 ]
Wang, Dong [1 ,2 ]
Tejedor, Javier [6 ]
Zheng, Thomas Fang [1 ,2 ]
Li, Yinguo [4 ]
机构
[1] Tsinghua Univ, Res Inst Informat Technol, Ctr Speech & Language Technol, Beijing 100084, Peoples R China
[2] Tsinghua Natl Lab Informat Sci & Technol, Ctr Speech & Language Technol, Div Tech Innovat & Dev, Beijing 100084, Peoples R China
[3] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[4] Chongqing Univ Posts & Telecommun, Sch Comp Sci & Technol, Chongqing 400065, Peoples R China
[5] Beijing Inst Technol, Beijing 100081, Peoples R China
[6] Univ Alcala, GEINTRA, Madrid, Spain
基金
美国国家科学基金会;
关键词
Speech recognition; Deep neural network; Noise injection; INJECTION; INPUTS;
D O I
10.1186/s13636-014-0047-0
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep neural networks (DNNs) have gained remarkable success in speech recognition, partially attributed to the flexibility of DNN models in learning complex patterns of speech signals. This flexibility, however, may lead to serious over-fitting and hence miserable performance degradation in adverse acoustic conditions such as those with high ambient noises. We propose a noisy training approach to tackle this problem: by injecting moderate noises into the training data intentionally and randomly, more generalizable DNN models can be learned. This 'noise injection' technique, although known to the neural computation community already, has not been studied with DNNs which involve a highly complex objective function. The experiments presented in this paper confirm that the noisy training approach works well for the DNN model and can provide substantial performance improvement for DNN-based speech recognition.
引用
收藏
页码:1 / 14
页数:14
相关论文
共 50 条
  • [31] INCOHERENT TRAINING OF DEEP NEURAL NETWORKS TO DE-CORRELATE BOTTLENECK FEATURES FOR SPEECH RECOGNITION
    Bao, Yebo
    Jiang, Hui
    Dai, Lirong
    Liu, Cong
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6980 - 6984
  • [32] AN ANALYSIS OF THE ROBUSTNESS OF DEEP FACE RECOGNITION NETWORKS TO NOISY TRAINING LABELS
    Reale, Christopher
    Nasrabadi, Nasser M.
    Chellappa, Rama
    2016 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2016, : 1192 - 1196
  • [33] Robust Noisy Speech Recognition Using Deep Neural Support Vector Machines
    Amami, Rimah
    Ben Ayed, Dorra
    DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE, 2019, 800 : 300 - 307
  • [34] EXPLORING DEEP NEURAL NETWORKS AND DEEP AUTOENCODERS IN REVERBERANT SPEECH RECOGNITION
    Mimura, Masato
    Sakai, Shinsuke
    Kawahara, Tatsuya
    2014 4TH JOINT WORKSHOP ON HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS (HSCMA), 2014, : 197 - 201
  • [35] Regularized sparse features for noisy speech enhancement using deep neural networks
    Khattak, Muhammad Irfan
    Saleem, Nasir
    Gao, Jiechao
    Verdu, Elena
    Fuente, Javier Parra
    COMPUTERS & ELECTRICAL ENGINEERING, 2022, 100
  • [36] EXPLOITING LSTM STRUCTURE IN DEEP NEURAL NETWORKS FOR SPEECH RECOGNITION
    He, Tianxing
    Droppo, Jasha
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5445 - 5449
  • [37] FSER: Deep Convolutional Neural Networks for Speech Emotion Recognition
    Dossou, Bonaventure F. P.
    Gbenou, Yeno K. S.
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3526 - 3531
  • [38] VERY DEEP CONVOLUTIONAL NEURAL NETWORKS FOR ROBUST SPEECH RECOGNITION
    Qian, Yanmin
    Woodland, Philip C.
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 481 - 488
  • [39] Automatic Recognition of Kazakh Speech Using Deep Neural Networks
    Mamyrbayev, Orken
    Turdalyuly, Mussa
    Mekebayev, Nurbapa
    Alimhan, Keylan
    Kydyrbekova, Aizat
    Turdalykyzy, Tolganay
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2019, PT II, 2019, 11432 : 465 - 474
  • [40] Comparative Analysis of Deep Recurrent Neural Networks for Speech Recognition
    Atosha, Pascal Bahavu
    Ozbilge, Emre
    Kirsal, Yonal
    32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,