TIME-FREQUENCY MASKING-BASED SPEECH ENHANCEMENT USING GENERATIVE ADVERSARIAL NETWORK

被引:0
|
作者
Soni, Meet H. [1 ]
Shah, Neil [1 ]
Patil, Hemant A. [1 ]
机构
[1] Dhirubhai Ambani Inst Informat & Commun Technol, Gandhinagar, India
关键词
Task-dependent masking; speech enhancement; generative adversarial networks;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The success of time-frequency (T-F) mask-based approaches is dependent on the accuracy of predicted mask given the noisy spectral features. The state-of-the-art methods in T-F masking-based enhancement employ Deep Neural Network (DNN) to predict mask. Recently, Generative Adversarial Networks (GAN) are gaining popularity instead of maximum likelihood (ML)-based optimization of deep learning architectures. In this paper, we propose to exploit GAN in TF masking-based enhancement framework. We present the viable strategy to use GAN in such application by modifying the existing approach. To achieve this, we use a method that learns the mask implicitly while predicting the clean TF representation. Moreover, we show the failure of vanilla GAN in predicting the accurate mask and propose a regularized objective function with the use of Mean Square Error (MSE) between predicted and target spectrum to overcome it. The objective evaluation of the proposed method shows the improvement in the accurate mask prediction, as against the state-of-the-art ML-based optimization techniques. The proposed system significantly improves over a recent GAN-based speech enhancement system in improving speech quality, while maintaining a better trade-off between less speech distortion and more effective removal of background interferences present in the noisy mixture.
引用
收藏
页码:5039 / 5043
页数:5
相关论文
共 50 条
  • [21] A Data Field method for speech enhancement incorporating Binary Time-Frequency Masking
    Huang, Jianjun
    Zhang, Yafei
    Zhang, Xiongwei
    Zhu, Tao
    PRZEGLAD ELEKTROTECHNICZNY, 2011, 87 (07): : 225 - 229
  • [22] Post-processing in masking-based β-order MMSE speech enhancement
    Zhang, Xinxin
    Koh, Soo Ngee
    Soon, Ing Yann
    You, Changhuai
    APPLIED ACOUSTICS, 2008, 69 (04) : 354 - 357
  • [23] Wavelet-Based Speech Enhancement Using Time-Frequency Adaptation
    Wang, Kun-Ching
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2009,
  • [24] Wavelet-Based Speech Enhancement Using Time-Frequency Adaptation
    Kun-Ching Wang
    EURASIP Journal on Advances in Signal Processing, 2009
  • [25] TIME-FREQUENCY MASKING BASED ONLINE SPEECH ENHANCEMENT WITH MULTI-CHANNEL DATA USING CONVOLUTIONAL NEURAL NETWORKS
    Chakrabarty, Soumitro
    Wang, DeLiang
    Habets, Emanuel A. P.
    2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 476 - 480
  • [26] Dual channel neural network speech enhancement algorithm based on time frequency masking
    Jia, Hairong
    Mei, Shulin
    Zhang, Min
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2021, 49 (06): : 43 - 49
  • [27] Time-domain speech enhancement using generative adversarial networks
    Pascual, Santiago
    Serra, Joan
    Bonafonte, Antonio
    SPEECH COMMUNICATION, 2019, 114 : 10 - 21
  • [28] VSEGAN: VISUAL SPEECH ENHANCEMENT GENERATIVE ADVERSARIAL NETWORK
    Xu, Xinmeng
    Wang, Yang
    Xu, Dongxiang
    Peng, Yiyuan
    Zhang, Cong
    Jia, Jie
    Chen, Binbin
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7307 - 7311
  • [29] Robust Automatic Speech Recognition System Based on Using Adaptive Time-Frequency Masking
    Gouda, Ahmed Mostafa
    Tamazin, Mohamed
    Khedr, Mohamed
    PROCEEDINGS OF 2016 11TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES), 2016, : 181 - 186
  • [30] Single-channel speech enhancement using improved progressive deep neural network and masking-based harmonic regeneration
    Ping, Huang
    Yafeng, Wu
    SPEECH COMMUNICATION, 2022, 145 : 36 - 46