A Conditional Generative Model for Speech Enhancement

被引:0
|
作者
Zeng-Xi Li
Li-Rong Dai
Yan Song
Ian McLoughlin
机构
[1] University of Science and Technology of China,National Engineering Laboratory for Speech and Language Information Processing
[2] University of Kent,School of Computing
来源
Circuits, Systems, and Signal Processing | 2018年 / 37卷
关键词
Deep learning; Speech enhancement; Generative model; Adversarial training;
D O I
暂无
中图分类号
学科分类号
摘要
Deep learning-based speech enhancement approaches like deep neural networks (DNN) and Long Short-Term Memory (LSTM) have already demonstrated superior results to classical methods. However, these methods do not take full advantage of temporal context information. While DNN and LSTM consider temporal context in the noisy source speech, it does not do so for the estimated clean speech. Both DNN and LSTM also have a tendency to over-smooth spectra, which causes the enhanced speech to sound muffled. This paper proposes a novel architecture to address both issues, which we term a conditional generative model (CGM). By adopting an adversarial training scheme applied to a generator of deep dilated convolutional layers, CGM is designed to model the joint and symmetric conditions of both noisy and estimated clean spectra. We evaluate CGM against both DNN and LSTM in terms of Perceptual Evaluation of Speech Quality (PESQ) and Short-Time Objective Intelligibility (STOI) on TIMIT sentences corrupted by ITU-T P.501 and NOISEX-92 noise in a range of matched and mismatched noise conditions. Results show that both the CGM architecture and the adversarial training mechanism lead to better PESQ and STOI in all tested noise conditions. In addition to yielding significant improvements in PESQ and STOI, CGM and adversarial training both mitigate against over-smoothing.
引用
收藏
页码:5005 / 5022
页数:17
相关论文
共 50 条
  • [11] EXPLORING SPEECH ENHANCEMENT WITH GENERATIVE ADVERSARIAL NETWORKS FOR ROBUST SPEECH RECOGNITION
    Donahue, Chris
    Li, Bo
    Prabhavalkar, Rohit
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5024 - 5028
  • [12] SPEECH ENHANCEMENT VIA GENERATIVE ADVERSARIAL LSTM NETWORKS
    Xiang, Yang
    Bao, Changchun
    2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 46 - 50
  • [13] Speech Enhancement with Generative Diffusion Models
    O. V. Girfanov
    A. G. Shishkin
    Automatic Documentation and Mathematical Linguistics, 2023, 57 : 249 - 257
  • [14] Speech Enhancement with Generative Diffusion Models
    Girfanov, O. V.
    Shishkin, A. G.
    AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS, 2023, 57 (05) : 249 - 257
  • [15] Cross Conditional Network for Speech Enhancement
    Tanaka, Haruki
    Sugiura, Yosuke
    Yasui, Nozomiko
    Shimamura, Tetsuya
    Miyazaki, Ryoichi
    2019 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ISPACS), 2019,
  • [16] Multi-scale Generative Adversarial Networks for Speech Enhancement
    Li, Yihang
    Jiang, Ting
    Qin, Shan
    2019 7TH IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (IEEE GLOBALSIP), 2019,
  • [17] LANGUAGE AND NOISE TRANSFER IN SPEECH ENHANCEMENT GENERATIVE ADVERSARIAL NETWORK
    Pascual, Santiago
    Park, Maruchan
    Serra, Joan
    Bonafonte, Antonio
    Ahn, Kang-Hun
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5019 - 5023
  • [18] Speech Enhancement Using Generative Dictionary Learning
    Sigg, Christian D.
    Dikk, Tomas
    Buhmann, Joachim M.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (06): : 1698 - 1712
  • [19] PAGAN: A PHASE-ADAPTED GENERATIVE ADVERSARIAL NETWORKS FOR SPEECH ENHANCEMENT
    Li, Peishuo
    Jiang, Zihang
    Yin, Shouyi
    Song, Dandan
    Ouyang, Peng
    Liu, Leibo
    Wei, Shaojun
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6234 - 6238
  • [20] Speech Enhancement Based on A New Architecture of Wasserstein Generative Adversarial Networks
    Ye, Shuaishuai
    Jiang, Ting
    Qin, Shan
    Zou, Weixia
    Deng, Chengyun
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 399 - 403