Speech Enhancement for Noise-Robust Speech Synthesis using Wasserstein GAN

被引:14
作者
Adiga, Nagaraj [1 ]
Pantazis, Yannis [2 ]
Tsiaras, Vassilis [1 ]
Stylianou, Yannis [1 ]
机构
[1] Univ Crete, Dept Comp Sci, Iraklion, Greece
[2] FORTH, Inst Appl & Computat Math, Iraklion, Greece
来源
INTERSPEECH 2019 | 2019年
关键词
Wasserstein GAN; Speech Enhancement; Gated activation; WaveNet Vocoder; Speech Synthesis;
D O I
10.21437/Interspeech.2019-2648
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
The quality of speech synthesis systems can be significantly deteriorated by the presence of background noise in the recordings. Despite the existence of speech enhancement techniques for effectively suppressing additive noise under low signal-to-noise (SNR) conditions, these techniques have been neither designed nor tested in speech synthesis tasks where background noise has relatively lower energy. In this paper, we propose a speech enhancement technique based on generative adversarial networks (GANs) which acts as a preprocessing step of speech synthesis. Motivated by the speech enhancement generative adversarial network (SEGAN) approach and recent advances in deep learning, we propose to use Wasserstein GAN (WGAN) with gradient penalty and gated activation functions to the autoencoder network of SEGAN. We studied the impact of the proposed method on a data set consisting of 28 speakers and different noise types with 3 different SNR level. The effectiveness of the proposed method in the context of speech synthesis is demonstrated through the training of WaveNet vocoder. We compare our method against SEGAN. Both subjective and objective metrics confirm that the proposed speech enhancement approach outperforms SEGAN in terms of speech synthesis quality.
引用
收藏
页码:1821 / 1825
页数:5
相关论文
共 24 条
[1]  
Adiga N, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5674, DOI 10.1109/ICASSP.2018.8462393
[2]  
[Anonymous], 1988, Objective measures of speech quality
[3]  
[Anonymous], 2016, ARXIV160907132
[4]  
[Anonymous], 2017, ARXIV170107875
[5]  
[Anonymous], 2016, arXiv
[6]  
Arik SÖ, 2017, ADV NEUR IN, V30
[7]   SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].
BOLL, SF .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :113-120
[8]  
Cernak M., 2005, P EUR C AC, P2725
[9]  
Dauphin YN, 2017, PR MACH LEARN RES, V70
[10]  
Hu Y., 2006, 9 INT C SPOK LANG PR