A Conditional Generative Model for Speech Enhancement

被引：0

作者：

Zeng-Xi Li

Li-Rong Dai

Yan Song

Ian McLoughlin

机构：

[1] University of Science and Technology of China,National Engineering Laboratory for Speech and Language Information Processing

[2] University of Kent,School of Computing

来源：

Circuits, Systems, and Signal Processing | 2018年 / 37卷

关键词：

Deep learning; Speech enhancement; Generative model; Adversarial training;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Deep learning-based speech enhancement approaches like deep neural networks (DNN) and Long Short-Term Memory (LSTM) have already demonstrated superior results to classical methods. However, these methods do not take full advantage of temporal context information. While DNN and LSTM consider temporal context in the noisy source speech, it does not do so for the estimated clean speech. Both DNN and LSTM also have a tendency to over-smooth spectra, which causes the enhanced speech to sound muffled. This paper proposes a novel architecture to address both issues, which we term a conditional generative model (CGM). By adopting an adversarial training scheme applied to a generator of deep dilated convolutional layers, CGM is designed to model the joint and symmetric conditions of both noisy and estimated clean spectra. We evaluate CGM against both DNN and LSTM in terms of Perceptual Evaluation of Speech Quality (PESQ) and Short-Time Objective Intelligibility (STOI) on TIMIT sentences corrupted by ITU-T P.501 and NOISEX-92 noise in a range of matched and mismatched noise conditions. Results show that both the CGM architecture and the adversarial training mechanism lead to better PESQ and STOI in all tested noise conditions. In addition to yielding significant improvements in PESQ and STOI, CGM and adversarial training both mitigate against over-smoothing.

引用

页码：5005 / 5022

页数：17

共 50 条

[41] Target Speech Extraction with Conditional Diffusion Model
Kamo, Naoyuki
Delcroix, Marc
Nakatani, Tomohiro
INTERSPEECH 2023, 2023, : 176 - 180
[42] Speech Enhancement with Zero-Shot Model Selection
Zezario, Ryandhimas E.
Fuh, Chiou-Shann
Wang, Hsin-Min
Tsao, Yu
29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 491 - 495
[43] Speech Enhancement Using Forked Generative Adversarial Networks with Spectral Subtraction
Lin, Ju
Niu, Sufeng
Wei, Zice
Lan, Xiang
van Wijngaarden, Adriaan J.
Smith, Melissa C.
Wang, Kuang-Ching
INTERSPEECH 2019, 2019, : 3163 - 3167
[44] A Joint Framework of Denoising Autoencoder and Generative Vocoder for Monaural Speech Enhancement
Du, Zhihao
Zhang, Xueliang
Han, Jiqing
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1493 - 1505
[45] Time-domain speech enhancement using generative adversarial networks
Pascual, Santiago
Serra, Joan
Bonafonte, Antonio
SPEECH COMMUNICATION, 2019, 114 : 10 - 21
[46] On the Use of Audio Fingerprinting Features for Speech Enhancement with Generative Adversarial Network
Faraji, Farnood
Attabi, Yazid
Champagne, Benoit
Zhu, Wei-Ping
2020 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), 2020, : 77 - 82
[47] A Dual Stream Generative Adversarial Network with Phase Awareness for Speech Enhancement
Liang, Xintao
Li, Yuhang
Li, Xiaomin
Zhang, Yue
Ding, Youdong
INFORMATION, 2023, 14 (04)
[48] A ROBUST AUDIO-VISUAL SPEECH ENHANCEMENT MODEL
Wang, Wupeng
Xing, Chao
Wang, Dong
Chen, Xiao
Sun, Fengyu
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7529 - 7533
[49] Speech Enhancement with Topology-enhanced Generative Adversarial Networks (GANs)
Zhang, Xudong
Zhao, Liang
Gu, Feng
INTERSPEECH 2021, 2021, : 2726 - 2730
[50] NASTAR: Noise Adaptive Speech Enhancement with Target-Conditional Resampling
Lee, Chi-Chang
Hu, Cheng-Hung
Lin, Yu-Chen
Chen, Chu-Song
Wang, Hsin-Min
Tsao, Yu
INTERSPEECH 2022, 2022, : 1183 - 1187

← 1 2 3 4 5 →