Speech Enhancement via Residual Dense Generative Adversarial Network

被引:6
作者
Zhou, Lin [1 ]
Zhong, Qiuyue [1 ]
Wang, Tianyi [1 ]
Lu, Siyuan [1 ]
Hu, Hongmei [2 ,3 ]
机构
[1] Southeast Univ, Sch Informat Sci & Engn, Nanjing 210096, Peoples R China
[2] Carl von Ossietzky Univ Oldenburg, Med Phys, D-26129 Oldenburg, Germany
[3] Carl von Ossietzky Univ Oldenburg, Cluster Excellence Hearing4all, Dept Med Phys & Acoust, D-26129 Oldenburg, Germany
来源
COMPUTER SYSTEMS SCIENCE AND ENGINEERING | 2021年 / 38卷 / 03期
关键词
Generative adversarial networks; neural networks; residual dense block; speech enhancement; NOISE;
D O I
10.32604/csse.2021.016524
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Generative adversarial networks (GANs) are paid more attention to dealing with the end-to-end speech enhancement in recent years. Various GAN-based enhancement methods are presented to improve the quality of reconstructed speech. However, the performance of these GAN-based methods is worse than those of masking-based methods. To tackle this problem, we propose speech enhancement method with a residual dense generative adversarial network (RDGAN) contributing to map the log-power spectrum (LPS) of degraded speech to the clean one. In detail, a residual dense block (RDB) architecture is designed to better estimate the LPS of clean speech, which can extract rich local features of LPS through densely connected convolution layers. Meanwhile, sequential RDB connections are incorporated on various scales of LPS. It significantly increases the feature learning flexibility and robustness in the time-frequency domain. Simulations show that the proposed method achieves attractive speech enhancement performance in various acoustic environments. Specifically, in the untrained acoustic test with limited priors, e.g., unmatched signal-to-noise ratio (SNR) and unmatched noise category, RDGAN can still outperform the existing GAN-based methods and masking-based method in the measures of PESQ and other evaluation indexes. It indicates that our method is more generalized in untrained conditions.
引用
收藏
页码:279 / 289
页数:11
相关论文
共 40 条
[1]  
[Anonymous], 2006, P SPECOM
[2]  
[Anonymous], 2007, TEL STAND SECT ITU, P12
[3]  
Chen W., 2019, Journal of New Media, V1, P35, DOI DOI 10.32604/JNM.2019.05803
[4]  
Donahue C, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5024, DOI 10.1109/ICASSP.2018.8462581
[5]   Evaluation of speech enhancement techniques for speaker identification in noisy environments [J].
El-Solh, A. ;
Cuhadar, A. ;
Goubran, R. A. .
ISM WORKSHOPS 2007: NINTH IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA - WORKSHOPS, PROCEEDINGS, 2007, :235-239
[6]   Image Super-Resolution Based on Generative Adversarial Networks: A Brief Review [J].
Fu, Kui ;
Peng, Jiansheng ;
Zhang, Hanxiao ;
Wang, Xiaoliang ;
Jiang, Frank .
CMC-COMPUTERS MATERIALS & CONTINUA, 2020, 64 (03) :1977-1997
[7]  
Fu SW, 2017, ASIAPAC SIGN INFO PR, P6, DOI 10.1109/APSIPA.2017.8281993
[8]  
Glorot X., 2011, P 14 INT C ART INT S, P315
[9]  
Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672
[10]   Evaluation of objective quality measures for speech enhancement [J].
Hu, Yi ;
Loizou, Philipos C. .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (01) :229-238