TIME-FREQUENCY MASKING-BASED SPEECH ENHANCEMENT USING GENERATIVE ADVERSARIAL NETWORK

被引:0
|
作者
Soni, Meet H. [1 ]
Shah, Neil [1 ]
Patil, Hemant A. [1 ]
机构
[1] Dhirubhai Ambani Inst Informat & Commun Technol, Gandhinagar, India
关键词
Task-dependent masking; speech enhancement; generative adversarial networks;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The success of time-frequency (T-F) mask-based approaches is dependent on the accuracy of predicted mask given the noisy spectral features. The state-of-the-art methods in T-F masking-based enhancement employ Deep Neural Network (DNN) to predict mask. Recently, Generative Adversarial Networks (GAN) are gaining popularity instead of maximum likelihood (ML)-based optimization of deep learning architectures. In this paper, we propose to exploit GAN in TF masking-based enhancement framework. We present the viable strategy to use GAN in such application by modifying the existing approach. To achieve this, we use a method that learns the mask implicitly while predicting the clean TF representation. Moreover, we show the failure of vanilla GAN in predicting the accurate mask and propose a regularized objective function with the use of Mean Square Error (MSE) between predicted and target spectrum to overcome it. The objective evaluation of the proposed method shows the improvement in the accurate mask prediction, as against the state-of-the-art ML-based optimization techniques. The proposed system significantly improves over a recent GAN-based speech enhancement system in improving speech quality, while maintaining a better trade-off between less speech distortion and more effective removal of background interferences present in the noisy mixture.
引用
收藏
页码:5039 / 5043
页数:5
相关论文
共 50 条
  • [1] Phase sensitive masking-based single channel speech enhancement using conditional generative adversarial network
    Routray, Sidheswar
    Mao, Qirong
    COMPUTER SPEECH AND LANGUAGE, 2022, 71
  • [2] Time-Frequency Mask-based Speech Enhancement using Convolutional Generative Adversarial Network
    Shah, Neil
    Patil, Hemant A.
    Soni, Meet H.
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1246 - 1251
  • [3] SkipConvGAN: Monaural Speech Dereverberation Using Generative Adversarial Networks via Complex Time-Frequency Masking
    Kothapally, Vinay
    Hansen, John H. L.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1600 - 1613
  • [4] Speech Understanding Performance of Cochlear Implant Subjects Using Time-Frequency Masking-Based Noise Reduction
    Qazi, Obaid Ur Rehman
    van Dijk, Bas
    Moonen, Marc
    Wouters, Jan
    IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2012, 59 (05) : 1364 - 1373
  • [5] Time-frequency masking based supervised speech enhancement framework using fuzzy deep belief network
    Samui, Suman
    Chakrabarti, Indrajit
    Ghosh, Soumya K.
    APPLIED SOFT COMPUTING, 2019, 74 : 583 - 602
  • [6] An invertible frequency eigendomain transformation for masking-based subspace speech enhancement
    You, CH
    Koh, SN
    Rahardja, S
    IEEE SIGNAL PROCESSING LETTERS, 2005, 12 (06) : 461 - 464
  • [7] A Maximum Likelihood Approach to Masking-based Speech Enhancement Using Deep Neural Network
    Wang, Qing
    Du, Jun
    Chai, Li
    Dai, Li-Rong
    Lee, Chin-Hui
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 295 - 299
  • [8] Masking-based β-order MMSE speech enhancement
    You, CH
    Koh, SN
    Rahardja, S
    SPEECH COMMUNICATION, 2006, 48 (01) : 57 - 70
  • [9] PHASE TIME-FREQUENCY MASKING BASED SPEECH ENHANCEMENT ALGORITHM USING CIRCULAR MICROPHONE ARRAY
    He, Li
    Zhou, Yi
    Liu, Hongqing
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 808 - 813
  • [10] Masking-based Neural Beamformer for Multichannel Speech Enhancement
    Nie, Shuai
    Liang, Shan
    Yang, Zhanlei
    Xiao, Longshuai
    Liu, Wenju
    Tao, Jianhua
    2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 125 - 129