A Conditional Generative Model for Speech Enhancement

被引:9
|
作者
Li, Zeng-Xi [1 ]
Dai, Li-Rong [1 ]
Song, Yan [1 ]
McLoughlin, Ian [2 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Anhui, Peoples R China
[2] Univ Kent, Sch Comp, Medway, England
基金
中国国家自然科学基金;
关键词
Deep learning; Speech enhancement; Generative model; Adversarial training; NOISE;
D O I
10.1007/s00034-018-0798-4
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Deep learning-based speech enhancement approaches like deep neural networks (DNN) and Long Short-Term Memory (LSTM) have already demonstrated superior results to classical methods. However, these methods do not take full advantage of temporal context information. While DNN and LSTM consider temporal context in the noisy source speech, it does not do so for the estimated clean speech. Both DNN and LSTM also have a tendency to over-smooth spectra, which causes the enhanced speech to sound muffled. This paper proposes a novel architecture to address both issues, which we term a conditional generative model (CGM). By adopting an adversarial training scheme applied to a generator of deep dilated convolutional layers, CGM is designed to model the joint and symmetric conditions of both noisy and estimated clean spectra. We evaluate CGM against both DNN and LSTM in terms of Perceptual Evaluation of Speech Quality (PESQ) and Short-Time Objective Intelligibility (STOI) on TIMIT sentences corrupted by ITU-T P.501 and NOISEX-92 noise in a range of matched and mismatched noise conditions. Results show that both the CGM architecture and the adversarial training mechanism lead to better PESQ and STOI in all tested noise conditions. In addition to yielding significant improvements in PESQ and STOI, CGM and adversarial training both mitigate against over-smoothing.
引用
收藏
页码:5005 / 5022
页数:18
相关论文
共 50 条
  • [41] DE-GAN: A Conditional Generative Adversarial Network for Document Enhancement
    Souibgui, Mohamed Ali
    Kessentini, Yousri
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (03) : 1180 - 1191
  • [42] SINGLE AND FEW-STEP DIFFUSION FOR GENERATIVE SPEECH ENHANCEMENT
    Lay, Bunlong
    Lermercier, Jean-Marie
    Richter, Julius
    Gerkmann, Timo
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 626 - 630
  • [43] A Loss With Mixed Penalty for Speech Enhancement Generative Adversarial Network
    Cao, Jie
    Zhou, Yaofeng
    Yu, Hong
    Li, Xiaoxu
    Wang, Dan
    Ma, Zhanyu
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 86 - 90
  • [44] Enhancement of Alaryngeal Speech using Generative Adversarial Network (GAN)
    Huq, Mahmudul
    2021 IEEE/ACS 18TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2021,
  • [45] SELF-ATTENTION GENERATIVE ADVERSARIAL NETWORK FOR SPEECH ENHANCEMENT
    Huy Phan
    Nguyen, Huy Le
    Chen, Oliver Y.
    Koch, Philipp
    Duong, Ngoc Q. K.
    McLoughlin, Ian
    Mertins, Alfred
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7103 - 7107
  • [46] INDIRECT MODEL-BASED SPEECH ENHANCEMENT
    Le Roux, Jonathan
    Hershey, John R.
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4045 - 4048
  • [47] Audio-Visual Speech Enhancement Using Conditional Variational Auto-Encoders
    Sadeghi, Mostafa
    Leglaive, Simon
    Alameda-Pineda, Xavier
    Girin, Laurent
    Horaud, Radu
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1788 - 1800
  • [48] Speech Enhancement and Dereverberation With Diffusion-Based Generative Models
    Richter, Julius
    Welker, Simon
    Lemercier, Jean-Marie
    Lay, Bunlong
    Gerkmann, Timo
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2351 - 2364
  • [49] [Invited] Generative Model-Based Text-to-Speech Synthesis
    Zen, Heiga
    2018 IEEE 7TH GLOBAL CONFERENCE ON CONSUMER ELECTRONICS (GCCE 2018), 2018, : 327 - 328
  • [50] WaveTract: A hybrid generative model for speech synthesis
    Englert, Bruno Bence
    Zainko, Csaba
    Nemeth, Geza
    2019 10TH INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2019,