GENERATIVE ADVERSARIAL NETWORKS BASED DATA AUGMENTATION FOR NOISE ROBUST SPEECH RECOGNITION

被引：0

作者：

Hu, Hu ^{[1
]}

Tan, Tian ^{[1
]}

Qian, Yanmin ^{[1
,2
]}

机构：

[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, SpeechLab, Shanghai, Peoples R China

[2] Tencent, Tencent AI Lab, Bellevue, WA 98004 USA

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年

关键词：

robust speech recognition; very deep convolutional neural network; data augmentation; generative adversarial networks; unsupervised learning; DEEP-NEURAL-NETWORKS;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Data augmentation is an effective method to increase the size of training data and reduce the mismatch between training and testing for noise robust speech recognition. Different from the traditional approaches by directly adding noise to the original waveform, in this work we utilize generative adversarial networks (GAN) for data generation to improve speech recognition under noise conditions. With this method, the generated speech samples are based on spectrum feature level and produced frame by frame without dependence among them, and the augmented data has no true labels. Then to effectively use these untranscribed augmented data, an unsupervised learning framework is designed for acoustic modeling. The proposed GAN-based data augmentation approach is evaluated on Aurora4. The experimental results show that a relative similar to 7.0% WER reduction can be obtained by the proposed approach upon an advanced acoustic model.

引用

页码：5044 / 5048

页数：5

共 25 条

[1] [Anonymous], DETECTION CLASSIFICA
[2] [Anonymous], 2011, P INT C FLOR IT 27 3
[3] [Anonymous], 2014, TECH REP
[4] [Anonymous], 2017, CoRR
[5] Arjovsky M., 2017, PRINCIPLED METHODS T
[6] Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
Dahl, George E.
Yu, Dong
Deng, Li
Acero, Alex
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01): : 30 - 42
[7] Generative Adversarial Networks
Goodfellow, Ian
Pouget-Abadie, Jean
Mirza, Mehdi
Xu, Bing
Warde-Farley, David
Ozair, Sherjil
Courville, Aaron
Bengio, Yoshua
[J]. COMMUNICATIONS OF THE ACM, 2020, 63 (11) : 139 - 144
[8] Transcribing Meetings With the AMIDA Systems
Hain, Thomas
Burget, Lukas
Dines, John
Garner, Philip N.
Grezl, Frantisek
El Hannani, Asmaa
Huijbregts, Marijn
Karafiat, Martin
Lincoln, Mike
Wan, Vincent
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (02): : 486 - 498
[9] Deep Neural Networks for Acoustic Modeling in Speech Recognition
Hinton, Geoffrey
Deng, Li
Yu, Dong
Dahl, George E.
Mohamed, Abdel-rahman
Jaitly, Navdeep
Senior, Andrew
Vanhoucke, Vincent
Patrick Nguyen
Sainath, Tara N.
Kingsbury, Brian
[J]. IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 82 - 97
[10] Hsu Chin-Cheng., 2017, CoRR

← 1 2 3 →