Monaural Speech Dereverberation Using Deformable Convolutional Networks

被引:2
|
作者
Kothapally, Vinay [1 ]
Hansen, John H. L. [1 ]
机构
[1] Univ Texas Dallas, Ctr Robust Speech Syst, Richardson, TX 75080 USA
关键词
Speech enhancement; monaural dereverberation; deformable convolutional networks; minimum variance distortionless response; deep filtering; TIME-FREQUENCY MASKING; NEURAL-NETWORK; SELF-ATTENTION; ENHANCEMENT; NOISE; OPTIMIZATION; FRAMEWORK; DOMAIN; CNN;
D O I
10.1109/TASLP.2024.3358720
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Reverberation and background noise can degrade speech quality and intelligibility when captured by a distant microphone. In recent years, researchers have developed several deep learning (DL)-based single-channel speech dereverberation systems that aim to minimize distortions introduced into speech captured in naturalistic environments. A majority of these DL-based systems enhance an unseen distorted speech signal by applying a predetermined set of weights to regions of the speech spectrogram, regardless of the degree of distortion within the respective regions. Such a system might not be an ideal solution for dereverberation task. To address this, we present a DL-based end-to-end single-channel speech dereverberation system that uses deformable convolution networks (DCN) that dynamically adjusts its receptive field based on the degree of distortions within an unseen speech signal. The proposed system includes the following components to simultaneously enhance the magnitude and phase responses of speech, which leads to improved perceptual quality: (i) a complex spectrum enhancement module that uses multi-frame filtering technique to implicitly correct the phase response, (ii) a magnitude enhancement module that suppresses dominant reflections and recovers the formant structure using deep filtering (DF) technique, and (iii) a speech activity detection (SAD) estimation module that predicts frame-wise speech activity to suppress residuals in non-speech regions. We assess the performance of the proposed system by employing objective speech quality metrics on both simulated and real speech recordings from the REVERB challenge corpus. The experimental results demonstrate the benefits of using DCNs and multi-frame filtering for speech dereverberation task. We compare the performance of our proposed system against other signal processing (SP) and DL-based systems and observe that it consistently outperforms other approaches across all speech quality metrics.
引用
收藏
页码:1712 / 1723
页数:12
相关论文
共 50 条
  • [41] Monaural speech separation using GA-DNN integration scheme
    Sivapatham, Shoba
    Ramadoss, Rajavel
    Kar, Asutosh
    Majhi, Banshidhar
    APPLIED ACOUSTICS, 2020, 160
  • [42] Monaural Speech Separation Using Speaker Embedding From Preliminary Separation
    Byun, Jaeuk
    Shin, Jong Won
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2753 - 2763
  • [43] BINAURAL SPEECH ENHANCEMENT USING COMPLEX CONVOLUTIONAL RECURRENT NETWORKS
    Tokala, Vikas
    Grinstein, Eric
    Brookes, Mike
    Doclo, Simon
    Jensen, Jesper
    Naylor, Patrick A.
    FIFTY-SEVENTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, IEEECONF, 2023, : 1130 - 1134
  • [44] Face Detection Using R-FCN Based Deformable Convolutional Networks
    Chen, Qiaosong
    Shen, Fahai
    Ding, Yuanyuan
    Gong, Panhao
    Tao, Ya
    Wang, Jin
    2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2018, : 4165 - 4170
  • [45] Harmonic Attention for Monaural Speech Enhancement
    Wang, Tianrui
    Zhu, Weibin
    Gao, Yingying
    Zhang, Shilei
    Feng, Junlan
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2424 - 2436
  • [46] DRC-NET: DENSELY CONNECTED RECURRENT CONVOLUTIONAL NEURAL NETWORK FOR SPEECH DEREVERBERATION
    Liu, Jinjiang
    Zhang, Xueliang
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 166 - 170
  • [47] Speech Recognition of Punjabi Numerals Using Convolutional Neural Networks
    Aditi, Thakur
    Karun, Verma
    ADVANCES IN COMPUTER COMMUNICATION AND COMPUTATIONAL SCIENCES, VOL 1, 2019, 759 : 61 - 69
  • [48] SPEECH DEREVERBERATION AND DENOISING USING COMPLEX RATIO MASKS
    Williamson, Donald S.
    Wang, DeLiang
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5590 - 5594
  • [49] Multichannel Speech Enhancement by Raw Waveform-Mapping Using Fully Convolutional Networks
    Liu, Chang-Le
    Fu, Sze-Wei
    Li, You-Jin
    Huang, Jen-Wei
    Wang, Hsin-Min
    Tsao, Yu
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (28) : 1888 - 1900
  • [50] Speech Emotion Recognition and Deep Learning: An Extensive Validation Using Convolutional Neural Networks
    Ri, Francesco Ardan Dal
    Ciardi, Fabio Cifariello
    Conci, Nicola
    IEEE ACCESS, 2023, 11 : 116638 - 116649