A Conditional Generative Model for Speech Enhancement

被引:9
|
作者
Li, Zeng-Xi [1 ]
Dai, Li-Rong [1 ]
Song, Yan [1 ]
McLoughlin, Ian [2 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Anhui, Peoples R China
[2] Univ Kent, Sch Comp, Medway, England
基金
中国国家自然科学基金;
关键词
Deep learning; Speech enhancement; Generative model; Adversarial training; NOISE;
D O I
10.1007/s00034-018-0798-4
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Deep learning-based speech enhancement approaches like deep neural networks (DNN) and Long Short-Term Memory (LSTM) have already demonstrated superior results to classical methods. However, these methods do not take full advantage of temporal context information. While DNN and LSTM consider temporal context in the noisy source speech, it does not do so for the estimated clean speech. Both DNN and LSTM also have a tendency to over-smooth spectra, which causes the enhanced speech to sound muffled. This paper proposes a novel architecture to address both issues, which we term a conditional generative model (CGM). By adopting an adversarial training scheme applied to a generator of deep dilated convolutional layers, CGM is designed to model the joint and symmetric conditions of both noisy and estimated clean spectra. We evaluate CGM against both DNN and LSTM in terms of Perceptual Evaluation of Speech Quality (PESQ) and Short-Time Objective Intelligibility (STOI) on TIMIT sentences corrupted by ITU-T P.501 and NOISEX-92 noise in a range of matched and mismatched noise conditions. Results show that both the CGM architecture and the adversarial training mechanism lead to better PESQ and STOI in all tested noise conditions. In addition to yielding significant improvements in PESQ and STOI, CGM and adversarial training both mitigate against over-smoothing.
引用
收藏
页码:5005 / 5022
页数:18
相关论文
共 50 条
  • [21] Cross Conditional Network for Speech Enhancement
    Tanaka, Haruki
    Sugiura, Yosuke
    Yasui, Nozomiko
    Shimamura, Tetsuya
    Miyazaki, Ryoichi
    2019 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ISPACS), 2019,
  • [22] Speech Enhancement via Residual Dense Generative Adversarial Network
    Zhou, Lin
    Zhong, Qiuyue
    Wang, Tianyi
    Lu, Siyuan
    Hu, Hongmei
    COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2021, 38 (03): : 279 - 289
  • [23] LANGUAGE AND NOISE TRANSFER IN SPEECH ENHANCEMENT GENERATIVE ADVERSARIAL NETWORK
    Pascual, Santiago
    Park, Maruchan
    Serra, Joan
    Bonafonte, Antonio
    Ahn, Kang-Hun
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5019 - 5023
  • [24] A New Method for Improving Generative Adversarial Networks in Speech Enhancement
    Yang, Fan
    Li, Junfeng
    Yan, Yonghong
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [25] Conditional Emission Densities for Combining Speech Enhancement and Recognition Systems
    Sehr, Armin
    Yoshioka, Takuya
    Delcroix, Marc
    Kinoshita, Keisuke
    Nakatani, Tomohiro
    Maas, Roland
    Kellermann, Walter
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3469 - 3473
  • [26] Speech Enhancement Using Forked Generative Adversarial Networks with Spectral Subtraction
    Lin, Ju
    Niu, Sufeng
    Wei, Zice
    Lan, Xiang
    van Wijngaarden, Adriaan J.
    Smith, Melissa C.
    Wang, Kuang-Ching
    INTERSPEECH 2019, 2019, : 3163 - 3167
  • [27] A Joint Framework of Denoising Autoencoder and Generative Vocoder for Monaural Speech Enhancement
    Du, Zhihao
    Zhang, Xueliang
    Han, Jiqing
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1493 - 1505
  • [28] Time-domain speech enhancement using generative adversarial networks
    Pascual, Santiago
    Serra, Joan
    Bonafonte, Antonio
    SPEECH COMMUNICATION, 2019, 114 : 10 - 21
  • [29] On the Use of Audio Fingerprinting Features for Speech Enhancement with Generative Adversarial Network
    Faraji, Farnood
    Attabi, Yazid
    Champagne, Benoit
    Zhu, Wei-Ping
    2020 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), 2020, : 77 - 82
  • [30] Speech Enhancement Based on A New Architecture of Wasserstein Generative Adversarial Networks
    Ye, Shuaishuai
    Jiang, Ting
    Qin, Shan
    Zou, Weixia
    Deng, Chengyun
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 399 - 403