GENERATIVE ADVERSARIAL NETWORKS BASED DATA AUGMENTATION FOR NOISE ROBUST SPEECH RECOGNITION

被引:0
作者
Hu, Hu [1 ]
Tan, Tian [1 ]
Qian, Yanmin [1 ,2 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, SpeechLab, Shanghai, Peoples R China
[2] Tencent, Tencent AI Lab, Bellevue, WA 98004 USA
来源
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年
关键词
robust speech recognition; very deep convolutional neural network; data augmentation; generative adversarial networks; unsupervised learning; DEEP-NEURAL-NETWORKS;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Data augmentation is an effective method to increase the size of training data and reduce the mismatch between training and testing for noise robust speech recognition. Different from the traditional approaches by directly adding noise to the original waveform, in this work we utilize generative adversarial networks (GAN) for data generation to improve speech recognition under noise conditions. With this method, the generated speech samples are based on spectrum feature level and produced frame by frame without dependence among them, and the augmented data has no true labels. Then to effectively use these untranscribed augmented data, an unsupervised learning framework is designed for acoustic modeling. The proposed GAN-based data augmentation approach is evaluated on Aurora4. The experimental results show that a relative similar to 7.0% WER reduction can be obtained by the proposed approach upon an advanced acoustic model.
引用
收藏
页码:5044 / 5048
页数:5
相关论文
共 25 条
  • [1] [Anonymous], DETECTION CLASSIFICA
  • [2] [Anonymous], 2011, P INT C FLOR IT 27 3
  • [3] [Anonymous], 2014, TECH REP
  • [4] [Anonymous], 2017, CoRR
  • [5] Arjovsky M., 2017, PRINCIPLED METHODS T
  • [6] Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
    Dahl, George E.
    Yu, Dong
    Deng, Li
    Acero, Alex
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01): : 30 - 42
  • [7] Generative Adversarial Networks
    Goodfellow, Ian
    Pouget-Abadie, Jean
    Mirza, Mehdi
    Xu, Bing
    Warde-Farley, David
    Ozair, Sherjil
    Courville, Aaron
    Bengio, Yoshua
    [J]. COMMUNICATIONS OF THE ACM, 2020, 63 (11) : 139 - 144
  • [8] Transcribing Meetings With the AMIDA Systems
    Hain, Thomas
    Burget, Lukas
    Dines, John
    Garner, Philip N.
    Grezl, Frantisek
    El Hannani, Asmaa
    Huijbregts, Marijn
    Karafiat, Martin
    Lincoln, Mike
    Wan, Vincent
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (02): : 486 - 498
  • [9] Deep Neural Networks for Acoustic Modeling in Speech Recognition
    Hinton, Geoffrey
    Deng, Li
    Yu, Dong
    Dahl, George E.
    Mohamed, Abdel-rahman
    Jaitly, Navdeep
    Senior, Andrew
    Vanhoucke, Vincent
    Patrick Nguyen
    Sainath, Tara N.
    Kingsbury, Brian
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 82 - 97
  • [10] Hsu Chin-Cheng., 2017, CoRR