Noisy training for deep neural networks in speech recognition

被引:77
|
作者
Yin, Shi [1 ,4 ]
Liu, Chao [1 ,3 ]
Zhang, Zhiyong [1 ,2 ]
Lin, Yiye [1 ,5 ]
Wang, Dong [1 ,2 ]
Tejedor, Javier [6 ]
Zheng, Thomas Fang [1 ,2 ]
Li, Yinguo [4 ]
机构
[1] Tsinghua Univ, Res Inst Informat Technol, Ctr Speech & Language Technol, Beijing 100084, Peoples R China
[2] Tsinghua Natl Lab Informat Sci & Technol, Ctr Speech & Language Technol, Div Tech Innovat & Dev, Beijing 100084, Peoples R China
[3] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[4] Chongqing Univ Posts & Telecommun, Sch Comp Sci & Technol, Chongqing 400065, Peoples R China
[5] Beijing Inst Technol, Beijing 100081, Peoples R China
[6] Univ Alcala, GEINTRA, Madrid, Spain
来源
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING | 2015年
基金
美国国家科学基金会;
关键词
Speech recognition; Deep neural network; Noise injection; INJECTION; INPUTS;
D O I
10.1186/s13636-014-0047-0
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep neural networks (DNNs) have gained remarkable success in speech recognition, partially attributed to the flexibility of DNN models in learning complex patterns of speech signals. This flexibility, however, may lead to serious over-fitting and hence miserable performance degradation in adverse acoustic conditions such as those with high ambient noises. We propose a noisy training approach to tackle this problem: by injecting moderate noises into the training data intentionally and randomly, more generalizable DNN models can be learned. This 'noise injection' technique, although known to the neural computation community already, has not been studied with DNNs which involve a highly complex objective function. The experiments presented in this paper confirm that the noisy training approach works well for the DNN model and can provide substantial performance improvement for DNN-based speech recognition.
引用
收藏
页码:1 / 14
页数:14
相关论文
共 50 条
  • [1] Noisy training for deep neural networks in speech recognition
    Shi Yin
    Chao Liu
    Zhiyong Zhang
    Yiye Lin
    Dong Wang
    Javier Tejedor
    Thomas Fang Zheng
    Yinguo Li
    EURASIP Journal on Audio, Speech, and Music Processing, 2015
  • [2] NOISY TRAINING FOR DEEP NEURAL NETWORKS
    Meng, Xiangtao
    Liu, Chao
    Zhang, Zhiyong
    Wang, Dong
    2014 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (CHINASIP), 2014, : 16 - 20
  • [3] SYNTHESIZED STEREO MAPPING VIA DEEP NEURAL NETWORKS FOR NOISY SPEECH RECOGNITION
    Du, Jun
    Dai, Li-Rong
    Huo, Qiang
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [4] An Analysis of Deep Neural Networks in Broad Phonetic Classes for Noisy Speech Recognition
    de-la-Calle-Silos, F.
    Gallardo-Antolin, A.
    Pelaez-Moreno, C.
    ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2016, 2016, 10077 : 87 - 96
  • [5] Speech recognition in noisy environments with Convolutional Neural Networks
    Santos, Rafael M.
    Matos, Leonardo N.
    Macedo, Hendrik T.
    Montalvao, Jugurta
    2015 BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2015), 2015, : 175 - 179
  • [6] Noisy speech recognition by hierarchical recurrent neural fuzzy networks
    Juang, CF
    Chiou, CT
    Huang, HJ
    2005 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), VOLS 1-6, CONFERENCE PROCEEDINGS, 2005, : 5122 - 5125
  • [7] DEEP MAXOUT NEURAL NETWORKS FOR SPEECH RECOGNITION
    Cai, Meng
    Shi, Yongzhe
    Liu, Jia
    2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 291 - 296
  • [8] Deep Neural Networks in Russian Speech Recognition
    Markovnikov, Nikita
    Kipyatkova, Irina
    Karpov, Alexey
    Filchenkov, Andrey
    ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE, 2018, 789 : 54 - 67
  • [9] Binary Deep Neural Networks for Speech Recognition
    Xiang, Xu
    Qian, Yanmin
    Yu, Kai
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 533 - 537
  • [10] SPEECH RECOGNITION WITH DEEP RECURRENT NEURAL NETWORKS
    Graves, Alex
    Mohamed, Abdel-rahman
    Hinton, Geoffrey
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6645 - 6649