Speech Enhancement Using Forked Generative Adversarial Networks with Spectral Subtraction

被引:6
作者
Lin, Ju [1 ]
Niu, Sufeng [2 ]
Wei, Zice [1 ]
Lan, Xiang [1 ]
van Wijngaarden, Adriaan J. [3 ]
Smith, Melissa C. [1 ]
Wang, Kuang-Ching [1 ]
机构
[1] Clemson Univ, Dept Elect & Comp Engn, Clemson, SC 29634 USA
[2] LinkedIn Inc, Mountain View, CA USA
[3] Nokia, Nokia Bell Labs, Murray Hill, NJ USA
来源
INTERSPEECH 2019 | 2019年
关键词
speech enhancement; generative adversarial network; log-power spectra; NOISE;
D O I
10.21437/Interspeech.2019-2954
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Speech enhancement techniques that use a generative adversarial network (GAN) can effectively suppress noise while allowing models to be trained end-to-end. However, such techniques directly operate on time-domain waveforms, which are often highly-dimensional and require extensive computation. This paper proposes a novel GAN-based speech enhancement method, referred to as S-ForkGAN, that operates on log-power spectra rather than on time-domain speech waveforms, and uses a forked GAN structure to extract both speech and noise information. By operating on log-power spectra, one can seamlessly include conventional spectral subtraction techniques, and the parameter space typically has a lower dimension. The performance of S-ForkGAN is assessed for automatic speech recognition (ASR) using the TIMIT data set and a wide range of noise conditions. It is shown that S-ForkGAN outperforms existing GAN-based techniques and that it has a lower complexity.
引用
收藏
页码:3163 / 3167
页数:5
相关论文
共 25 条
[1]   SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].
BOLL, SF .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :113-120
[2]  
Donahue C, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5024, DOI 10.1109/ICASSP.2018.8462581
[3]  
Garofolo J. S., 1993, DARPA TIMIT acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1
[4]  
Goodfellow I., 2014, P C NEUR INF PROC SY
[5]   A fast learning algorithm for deep belief nets [J].
Hinton, Geoffrey E. ;
Osindero, Simon ;
Teh, Yee-Whye .
NEURAL COMPUTATION, 2006, 18 (07) :1527-1554
[6]  
Kimar A., 2016, ARXIV160502427
[7]  
Lin J., ForkGAN Forked GAN based
[8]  
Loizou P., 2007, SPEECH ENHANCEMENT T
[9]   Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization [J].
Mohammadiha, Nasser ;
Smaragdis, Paris ;
Leijon, Arne .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (10) :2140-2151
[10]  
Narayanan A, 2013, INT CONF ACOUST SPEE, P7092, DOI 10.1109/ICASSP.2013.6639038