A Conditional Generative Model for Speech Enhancement

被引：0

作者：

Zeng-Xi Li

Li-Rong Dai

Yan Song

Ian McLoughlin

机构：

[1] University of Science and Technology of China,National Engineering Laboratory for Speech and Language Information Processing

[2] University of Kent,School of Computing

来源：

Circuits, Systems, and Signal Processing | 2018年 / 37卷

关键词：

Deep learning; Speech enhancement; Generative model; Adversarial training;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Deep learning-based speech enhancement approaches like deep neural networks (DNN) and Long Short-Term Memory (LSTM) have already demonstrated superior results to classical methods. However, these methods do not take full advantage of temporal context information. While DNN and LSTM consider temporal context in the noisy source speech, it does not do so for the estimated clean speech. Both DNN and LSTM also have a tendency to over-smooth spectra, which causes the enhanced speech to sound muffled. This paper proposes a novel architecture to address both issues, which we term a conditional generative model (CGM). By adopting an adversarial training scheme applied to a generator of deep dilated convolutional layers, CGM is designed to model the joint and symmetric conditions of both noisy and estimated clean spectra. We evaluate CGM against both DNN and LSTM in terms of Perceptual Evaluation of Speech Quality (PESQ) and Short-Time Objective Intelligibility (STOI) on TIMIT sentences corrupted by ITU-T P.501 and NOISEX-92 noise in a range of matched and mismatched noise conditions. Results show that both the CGM architecture and the adversarial training mechanism lead to better PESQ and STOI in all tested noise conditions. In addition to yielding significant improvements in PESQ and STOI, CGM and adversarial training both mitigate against over-smoothing.

引用

页码：5005 / 5022

页数：17

共 50 条

[31] SINGLE AND FEW-STEP DIFFUSION FOR GENERATIVE SPEECH ENHANCEMENT
Lay, Bunlong
Lermercier, Jean-Marie
Richter, Julius
Gerkmann, Timo
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 626 - 630
[32] A Loss With Mixed Penalty for Speech Enhancement Generative Adversarial Network
Cao, Jie
Zhou, Yaofeng
Yu, Hong
Li, Xiaoxu
Wang, Dan
Ma, Zhanyu
2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 86 - 90
[33] Enhancement of Alaryngeal Speech using Generative Adversarial Network (GAN)
Huq, Mahmudul
2021 IEEE/ACS 18TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2021,
[34] Speech Enhancement via Residual Dense Generative Adversarial Network
Zhou, Lin
Zhong, Qiuyue
Wang, Tianyi
Lu, Siyuan
Hu, Hongmei
COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2021, 38 (03): : 279 - 289
[35] SELF-ATTENTION GENERATIVE ADVERSARIAL NETWORK FOR SPEECH ENHANCEMENT
Huy Phan
Nguyen, Huy Le
Chen, Oliver Y.
Koch, Philipp
Duong, Ngoc Q. K.
McLoughlin, Ian
Mertins, Alfred
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7103 - 7107
[36] Speech Enhancement and Dereverberation With Diffusion-Based Generative Models
Richter, Julius
Welker, Simon
Lemercier, Jean-Marie
Lay, Bunlong
Gerkmann, Timo
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2351 - 2364
[37] A New Method for Improving Generative Adversarial Networks in Speech Enhancement
Yang, Fan
Li, Junfeng
Yan, Yonghong
2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
[38] Conditional Emission Densities for Combining Speech Enhancement and Recognition Systems
Sehr, Armin
Yoshioka, Takuya
Delcroix, Marc
Kinoshita, Keisuke
Nakatani, Tomohiro
Maas, Roland
Kellermann, Walter
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3469 - 3473
[39] [Invited] Generative Model-Based Text-to-Speech Synthesis
Zen, Heiga
2018 IEEE 7TH GLOBAL CONFERENCE ON CONSUMER ELECTRONICS (GCCE 2018), 2018, : 327 - 328
[40] WaveTract: A hybrid generative model for speech synthesis
Englert, Bruno Bence
Zainko, Csaba
Nemeth, Geza
2019 10TH INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2019,

← 1 2 3 4 5 →