A Conditional Generative Model for Speech Enhancement

被引：0

作者：

Zeng-Xi Li

Li-Rong Dai

Yan Song

Ian McLoughlin

机构：

[1] University of Science and Technology of China,National Engineering Laboratory for Speech and Language Information Processing

[2] University of Kent,School of Computing

来源：

Circuits, Systems, and Signal Processing | 2018年 / 37卷

关键词：

Deep learning; Speech enhancement; Generative model; Adversarial training;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Deep learning-based speech enhancement approaches like deep neural networks (DNN) and Long Short-Term Memory (LSTM) have already demonstrated superior results to classical methods. However, these methods do not take full advantage of temporal context information. While DNN and LSTM consider temporal context in the noisy source speech, it does not do so for the estimated clean speech. Both DNN and LSTM also have a tendency to over-smooth spectra, which causes the enhanced speech to sound muffled. This paper proposes a novel architecture to address both issues, which we term a conditional generative model (CGM). By adopting an adversarial training scheme applied to a generator of deep dilated convolutional layers, CGM is designed to model the joint and symmetric conditions of both noisy and estimated clean spectra. We evaluate CGM against both DNN and LSTM in terms of Perceptual Evaluation of Speech Quality (PESQ) and Short-Time Objective Intelligibility (STOI) on TIMIT sentences corrupted by ITU-T P.501 and NOISEX-92 noise in a range of matched and mismatched noise conditions. Results show that both the CGM architecture and the adversarial training mechanism lead to better PESQ and STOI in all tested noise conditions. In addition to yielding significant improvements in PESQ and STOI, CGM and adversarial training both mitigate against over-smoothing.

引用

页码：5005 / 5022

页数：17

共 50 条

[21] Underwater image enhancement based on conditional generative adversarial network
Yang, Miao
Hu, Ke
Du, Yixiang
Wei, Zhiqiang
Sheng, Zhibin
Hu, Jintong
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2020, 81 (81)
[22] Underwater Image Enhancement Based on Conditional Generative Adversarial Network
Jin Weipei
Guo Jichang
Qi Qing
LASER & OPTOELECTRONICS PROGRESS, 2020, 57 (14)
[23] Enhancing Automatic Speech Recognition Quality with a Second-Stage Speech Enhancement Generative Adversarial Network
Nossier, Soha A.
Wall, Julie
Moniri, Mansour
Glackin, Cornelius
Cannings, Nigel
2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2023, : 546 - 552
[24] A Time-Domain Speech Enhancement Model with Controllable Output Based on Conditional Network
Qu, Qingyang
Song, Jiahui
Zhang, Yuepeng
Yuan, Wenhao
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2025,
[25] VSEGAN: VISUAL SPEECH ENHANCEMENT GENERATIVE ADVERSARIAL NETWORK
Xu, Xinmeng
Wang, Yang
Xu, Dongxiang
Peng, Yiyuan
Zhang, Cong
Jia, Jie
Chen, Binbin
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7307 - 7311
[26] Speech Enhancement Using Generative Adversarial Network (GAN)
Huq, Mahmudul
Maskeliunas, Rytis
HYBRID INTELLIGENT SYSTEMS, HIS 2021, 2022, 420 : 273 - 282
[27] Towards Generalized Speech Enhancement with Generative Adversarial Networks
Pascual, Santiago
Serra, Joan
Bonafonte, Antonio
INTERSPEECH 2019, 2019, : 1791 - 1795
[28] Joint Ideal Ratio Mask and Generative Adversarial Networks for Monaural Speech Enhancement
Yuan, Jing
Bao, Changchun
PROCEEDINGS OF 2018 14TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2018, : 276 - 280
[29] Speech Enhancement with Score-Based Generative Models in the Complex STFT Domain
Welker, Simon
Richter, Julius
Gerkmann, Timo
INTERSPEECH 2022, 2022, : 2928 - 2932
[30] DE-GAN: A Conditional Generative Adversarial Network for Document Enhancement
Souibgui, Mohamed Ali
Kessentini, Yousri
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (03) : 1180 - 1191

← 1 2 3 4 5 →