TIME-FREQUENCY MASKING-BASED SPEECH ENHANCEMENT USING GENERATIVE ADVERSARIAL NETWORK

被引:0
|
作者
Soni, Meet H. [1 ]
Shah, Neil [1 ]
Patil, Hemant A. [1 ]
机构
[1] Dhirubhai Ambani Inst Informat & Commun Technol, Gandhinagar, India
关键词
Task-dependent masking; speech enhancement; generative adversarial networks;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The success of time-frequency (T-F) mask-based approaches is dependent on the accuracy of predicted mask given the noisy spectral features. The state-of-the-art methods in T-F masking-based enhancement employ Deep Neural Network (DNN) to predict mask. Recently, Generative Adversarial Networks (GAN) are gaining popularity instead of maximum likelihood (ML)-based optimization of deep learning architectures. In this paper, we propose to exploit GAN in TF masking-based enhancement framework. We present the viable strategy to use GAN in such application by modifying the existing approach. To achieve this, we use a method that learns the mask implicitly while predicting the clean TF representation. Moreover, we show the failure of vanilla GAN in predicting the accurate mask and propose a regularized objective function with the use of Mean Square Error (MSE) between predicted and target spectrum to overcome it. The objective evaluation of the proposed method shows the improvement in the accurate mask prediction, as against the state-of-the-art ML-based optimization techniques. The proposed system significantly improves over a recent GAN-based speech enhancement system in improving speech quality, while maintaining a better trade-off between less speech distortion and more effective removal of background interferences present in the noisy mixture.
引用
收藏
页码:5039 / 5043
页数:5
相关论文
共 50 条
  • [11] Speech Enhancement Using Generative Adversarial Network (GAN)
    Huq, Mahmudul
    Maskeliunas, Rytis
    HYBRID INTELLIGENT SYSTEMS, HIS 2021, 2022, 420 : 273 - 282
  • [12] GSC Based Speech Enhancement with Generative Adversarial Network
    Zhou, Yao
    Bao, Changchun
    Cheng, Rui
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 901 - 906
  • [13] AeGAN: Time-Frequency Speech Denoising via Generative Adversarial Networks
    Abdulatif, Sherif
    Armanious, Karim
    Guirguis, Karim
    Sajeev, Jayasankar T.
    Yang, Bin
    28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 451 - 455
  • [14] Robust speech separation using time-frequency masking
    Aarabi, P
    Shi, GJ
    Jahromi, O
    2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I, PROCEEDINGS, 2003, : 741 - 744
  • [15] On time-frequency masking in voiced speech
    Skoglund, J
    Kleijn, WB
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (04): : 361 - 369
  • [16] Enhancement of Alaryngeal Speech using Generative Adversarial Network (GAN)
    Huq, Mahmudul
    2021 IEEE/ACS 18TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2021,
  • [17] SEGAN: Speech Enhancement Generative Adversarial Network
    Pascual, Santiago
    Bonafonte, Antonio
    Serra, Joan
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3642 - 3646
  • [18] A time-frequency smoothing neural network for speech enhancement
    Yuan, Wenhao
    SPEECH COMMUNICATION, 2020, 124 : 75 - 84
  • [19] MULTICHANNEL SPEECH ENHANCEMENT BASED ON TIME-FREQUENCY MASKING USING SUBBAND LONG SHORT-TERM MEMORY
    Li, Xiaofei
    Horaud, Radu
    2019 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2019, : 298 - 302
  • [20] Time frequency masking based speech enhancement using deep encoder-decoder neural network
    Shi, Wenhua
    Zhang, Xiongwei
    Zou, Xia
    Sun, Meng
    Li, Li
    Shengxue Xuebao/Acta Acustica, 2020, 45 (03): : 299 - 307