TIME-FREQUENCY MASKING-BASED SPEECH ENHANCEMENT USING GENERATIVE ADVERSARIAL NETWORK

被引:0
|
作者
Soni, Meet H. [1 ]
Shah, Neil [1 ]
Patil, Hemant A. [1 ]
机构
[1] Dhirubhai Ambani Inst Informat & Commun Technol, Gandhinagar, India
来源
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年
关键词
Task-dependent masking; speech enhancement; generative adversarial networks;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The success of time-frequency (T-F) mask-based approaches is dependent on the accuracy of predicted mask given the noisy spectral features. The state-of-the-art methods in T-F masking-based enhancement employ Deep Neural Network (DNN) to predict mask. Recently, Generative Adversarial Networks (GAN) are gaining popularity instead of maximum likelihood (ML)-based optimization of deep learning architectures. In this paper, we propose to exploit GAN in TF masking-based enhancement framework. We present the viable strategy to use GAN in such application by modifying the existing approach. To achieve this, we use a method that learns the mask implicitly while predicting the clean TF representation. Moreover, we show the failure of vanilla GAN in predicting the accurate mask and propose a regularized objective function with the use of Mean Square Error (MSE) between predicted and target spectrum to overcome it. The objective evaluation of the proposed method shows the improvement in the accurate mask prediction, as against the state-of-the-art ML-based optimization techniques. The proposed system significantly improves over a recent GAN-based speech enhancement system in improving speech quality, while maintaining a better trade-off between less speech distortion and more effective removal of background interferences present in the noisy mixture.
引用
收藏
页码:5039 / 5043
页数:5
相关论文
共 50 条
  • [21] A Phase-Based Time-Frequency masking for multi-channel speech enhancement in domestic environments
    Brutti, Alessio
    Tsiami, Antigoni
    Katsamanis, Athanasios
    Maragos, Petros
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2875 - 2879
  • [22] SPEECH ENHANCEMENT BASED ON JOINT TIME-FREQUENCY SEGMENTATION
    Tantibundhit, C.
    Pernkopf, F.
    Kubin, G.
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4673 - +
  • [23] Lightweight End-to-End Speech Enhancement Generative Adversarial Network Using Sinc Convolutions
    Li, Lujun
    Wudamu
    Kuerzinger, Ludwig
    Watzel, Tobias
    Rigoll, Gerhard
    APPLIED SCIENCES-BASEL, 2021, 11 (16):
  • [24] Improved Wasserstein conditional generative adversarial network speech enhancement
    Qin, Shan
    Jiang, Ting
    EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2018,
  • [25] Improved Wasserstein conditional generative adversarial network speech enhancement
    Shan Qin
    Ting Jiang
    EURASIP Journal on Wireless Communications and Networking, 2018
  • [26] Time-Frequency Masking Based Online Multi-Channel Speech Enhancement With Convolutional Recurrent Neural Networks
    Chakrabarty, Soumitro
    Habets, Emanuel A. P.
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (04) : 787 - 799
  • [27] DENSELY CONNECTED NETWORK WITH TIME-FREQUENCY DILATED CONVOLUTION FOR SPEECH ENHANCEMENT
    Li, Yaxing
    Li, Xiaoqi
    Dong, Yuanjie
    Li, Meng
    Xu, Shan
    Xiong, Shengwu
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6860 - 6864
  • [28] SELF-ATTENTION GENERATIVE ADVERSARIAL NETWORK FOR SPEECH ENHANCEMENT
    Huy Phan
    Nguyen, Huy Le
    Chen, Oliver Y.
    Koch, Philipp
    Duong, Ngoc Q. K.
    McLoughlin, Ian
    Mertins, Alfred
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7103 - 7107
  • [29] Noise estimation based on time-frequency correlation for speech enhancement
    Yuan, Wenhao
    Lin, Jiajun
    An, Wei
    Wang, Yu
    Chen, Ning
    APPLIED ACOUSTICS, 2013, 74 (05) : 770 - 781
  • [30] Speech Enhancement Using Generative Adversarial Network by Distilling Knowledge from Statistical Method
    Wu, Jianfeng
    Hua, Yongzhu
    Yang, Shengying
    Qin, Hongshuai
    Qin, Huibin
    APPLIED SCIENCES-BASEL, 2019, 9 (16):