Monaural Speech Dereverberation Using Deformable Convolutional Networks

被引:2
|
作者
Kothapally, Vinay [1 ]
Hansen, John H. L. [1 ]
机构
[1] Univ Texas Dallas, Ctr Robust Speech Syst, Richardson, TX 75080 USA
关键词
Speech enhancement; monaural dereverberation; deformable convolutional networks; minimum variance distortionless response; deep filtering; TIME-FREQUENCY MASKING; NEURAL-NETWORK; SELF-ATTENTION; ENHANCEMENT; NOISE; OPTIMIZATION; FRAMEWORK; DOMAIN; CNN;
D O I
10.1109/TASLP.2024.3358720
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Reverberation and background noise can degrade speech quality and intelligibility when captured by a distant microphone. In recent years, researchers have developed several deep learning (DL)-based single-channel speech dereverberation systems that aim to minimize distortions introduced into speech captured in naturalistic environments. A majority of these DL-based systems enhance an unseen distorted speech signal by applying a predetermined set of weights to regions of the speech spectrogram, regardless of the degree of distortion within the respective regions. Such a system might not be an ideal solution for dereverberation task. To address this, we present a DL-based end-to-end single-channel speech dereverberation system that uses deformable convolution networks (DCN) that dynamically adjusts its receptive field based on the degree of distortions within an unseen speech signal. The proposed system includes the following components to simultaneously enhance the magnitude and phase responses of speech, which leads to improved perceptual quality: (i) a complex spectrum enhancement module that uses multi-frame filtering technique to implicitly correct the phase response, (ii) a magnitude enhancement module that suppresses dominant reflections and recovers the formant structure using deep filtering (DF) technique, and (iii) a speech activity detection (SAD) estimation module that predicts frame-wise speech activity to suppress residuals in non-speech regions. We assess the performance of the proposed system by employing objective speech quality metrics on both simulated and real speech recordings from the REVERB challenge corpus. The experimental results demonstrate the benefits of using DCNs and multi-frame filtering for speech dereverberation task. We compare the performance of our proposed system against other signal processing (SP) and DL-based systems and observe that it consistently outperforms other approaches across all speech quality metrics.
引用
收藏
页码:1712 / 1723
页数:12
相关论文
共 50 条
  • [31] Auditory Mask Estimation by RPCA for Monaural Speech Enhancement
    Shi, Wenhua
    Zhang, Xiongwei
    Zou, Xia
    Han, Wei
    Min, Gang
    2017 16TH IEEE/ACIS INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS 2017), 2017, : 179 - 184
  • [32] Speech Enhancement with Stochastic Temporal Convolutional Networks
    Richter, Julius
    Carbajal, Guillaume
    Gerkmann, Timo
    INTERSPEECH 2020, 2020, : 4516 - 4520
  • [33] Multi-stage Progressive Learning-Based Speech Enhancement Using Time-Frequency Attentive Squeezed Temporal Convolutional Networks
    Jannu, Chaitanya
    Vanambathina, Sunny Dayal
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2023, 42 (12) : 7467 - 7493
  • [34] Monaural Speech Enhancement Based on Spectrogram Decomposition for Convolutional Neural Network-sensitive Feature Extraction
    Shi, Hao
    Wang, Longbiao
    Li, Sheng
    Dang, Jianwu
    Kawahara, Tatsuya
    INTERSPEECH 2022, 2022, : 221 - 225
  • [35] Scale-aware dual-branch complex convolutional recurrent network for monaural speech enhancement
    Li, Yihao
    Sun, Meng
    Zhang, Xiongwei
    Van Hamme, Hugo
    COMPUTER SPEECH AND LANGUAGE, 2024, 86
  • [36] Personalized Dereverberation of Speech
    Xu, Ruilin
    Krishnan, Gurunandan
    Zheng, Changxi
    Nayar, Shree K.
    INTERSPEECH 2023, 2023, : 3859 - 3863
  • [37] Low-Power Convolutional Recurrent Neural Network For Monaural Speech Enhancement
    Gao, Fei
    Guan, Haixin
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 559 - 563
  • [38] Gated Residual Networks With Dilated Convolutions for Monaural Speech Enhancement
    Tan, Ke
    Chen, Jitong
    Wang, DeLiang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (01) : 189 - 198
  • [39] DISCRIMINATIVE DEEP RECURRENT NEURAL NETWORKS FOR MONAURAL SPEECH SEPARATION
    Wang, Guan-Xiang
    Hsu, Chung-Chien
    Chien, Jen-Tzung
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 2544 - 2548
  • [40] PERCEPTUAL IMPROVEMENT OF DEEP NEURAL NETWORKS FOR MONAURAL SPEECH ENHANCEMENT
    Han, Wei
    Zhang, Xiongwei
    Sun, Meng
    Shi, Wenhua
    Chen, Xushan
    Hu, Yonggang
    2016 IEEE INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2016,