TIME-FREQUENCY MASKING-BASED SPEECH ENHANCEMENT USING GENERATIVE ADVERSARIAL NETWORK

被引:0
|
作者
Soni, Meet H. [1 ]
Shah, Neil [1 ]
Patil, Hemant A. [1 ]
机构
[1] Dhirubhai Ambani Inst Informat & Commun Technol, Gandhinagar, India
来源
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年
关键词
Task-dependent masking; speech enhancement; generative adversarial networks;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The success of time-frequency (T-F) mask-based approaches is dependent on the accuracy of predicted mask given the noisy spectral features. The state-of-the-art methods in T-F masking-based enhancement employ Deep Neural Network (DNN) to predict mask. Recently, Generative Adversarial Networks (GAN) are gaining popularity instead of maximum likelihood (ML)-based optimization of deep learning architectures. In this paper, we propose to exploit GAN in TF masking-based enhancement framework. We present the viable strategy to use GAN in such application by modifying the existing approach. To achieve this, we use a method that learns the mask implicitly while predicting the clean TF representation. Moreover, we show the failure of vanilla GAN in predicting the accurate mask and propose a regularized objective function with the use of Mean Square Error (MSE) between predicted and target spectrum to overcome it. The objective evaluation of the proposed method shows the improvement in the accurate mask prediction, as against the state-of-the-art ML-based optimization techniques. The proposed system significantly improves over a recent GAN-based speech enhancement system in improving speech quality, while maintaining a better trade-off between less speech distortion and more effective removal of background interferences present in the noisy mixture.
引用
收藏
页码:5039 / 5043
页数:5
相关论文
共 50 条
  • [11] MULTICHANNEL SPEECH ENHANCEMENT BASED ON TIME-FREQUENCY MASKING USING SUBBAND LONG SHORT-TERM MEMORY
    Li, Xiaofei
    Horaud, Radu
    2019 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2019, : 298 - 302
  • [12] A Loss With Mixed Penalty for Speech Enhancement Generative Adversarial Network
    Cao, Jie
    Zhou, Yaofeng
    Yu, Hong
    Li, Xiaoxu
    Wang, Dan
    Ma, Zhanyu
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 86 - 90
  • [13] Speech Enhancement via Residual Dense Generative Adversarial Network
    Zhou, Lin
    Zhong, Qiuyue
    Wang, Tianyi
    Lu, Siyuan
    Hu, Hongmei
    COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2021, 38 (03): : 279 - 289
  • [14] LANGUAGE AND NOISE TRANSFER IN SPEECH ENHANCEMENT GENERATIVE ADVERSARIAL NETWORK
    Pascual, Santiago
    Park, Maruchan
    Serra, Joan
    Bonafonte, Antonio
    Ahn, Kang-Hun
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5019 - 5023
  • [15] A Data Field method for speech enhancement incorporating Binary Time-Frequency Masking
    Huang, Jianjun
    Zhang, Yafei
    Zhang, Xiongwei
    Zhu, Tao
    PRZEGLAD ELEKTROTECHNICZNY, 2011, 87 (07): : 225 - 229
  • [16] Post-processing in masking-based β-order MMSE speech enhancement
    Zhang, Xinxin
    Koh, Soo Ngee
    Soon, Ing Yann
    You, Changhuai
    APPLIED ACOUSTICS, 2008, 69 (04) : 354 - 357
  • [17] TIME-FREQUENCY MASKING BASED ONLINE SPEECH ENHANCEMENT WITH MULTI-CHANNEL DATA USING CONVOLUTIONAL NEURAL NETWORKS
    Chakrabarty, Soumitro
    Wang, DeLiang
    Habets, Emanuel A. P.
    2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 476 - 480
  • [18] Time-domain speech enhancement using generative adversarial networks
    Pascual, Santiago
    Serra, Joan
    Bonafonte, Antonio
    SPEECH COMMUNICATION, 2019, 114 : 10 - 21
  • [19] VSEGAN: VISUAL SPEECH ENHANCEMENT GENERATIVE ADVERSARIAL NETWORK
    Xu, Xinmeng
    Wang, Yang
    Xu, Dongxiang
    Peng, Yiyuan
    Zhang, Cong
    Jia, Jie
    Chen, Binbin
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7307 - 7311
  • [20] Single-channel speech enhancement using improved progressive deep neural network and masking-based harmonic regeneration
    Ping, Huang
    Yafeng, Wu
    SPEECH COMMUNICATION, 2022, 145 : 36 - 46