Monaural Speech Dereverberation Using Deformable Convolutional Networks

被引：2

作者：

Kothapally, Vinay ^{[1
]}

Hansen, John H. L. ^{[1
]}

机构：

[1] Univ Texas Dallas, Ctr Robust Speech Syst, Richardson, TX 75080 USA

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2024年 / 32卷

关键词：

Speech enhancement; monaural dereverberation; deformable convolutional networks; minimum variance distortionless response; deep filtering; TIME-FREQUENCY MASKING; NEURAL-NETWORK; SELF-ATTENTION; ENHANCEMENT; NOISE; OPTIMIZATION; FRAMEWORK; DOMAIN; CNN;

D O I：

10.1109/TASLP.2024.3358720

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Reverberation and background noise can degrade speech quality and intelligibility when captured by a distant microphone. In recent years, researchers have developed several deep learning (DL)-based single-channel speech dereverberation systems that aim to minimize distortions introduced into speech captured in naturalistic environments. A majority of these DL-based systems enhance an unseen distorted speech signal by applying a predetermined set of weights to regions of the speech spectrogram, regardless of the degree of distortion within the respective regions. Such a system might not be an ideal solution for dereverberation task. To address this, we present a DL-based end-to-end single-channel speech dereverberation system that uses deformable convolution networks (DCN) that dynamically adjusts its receptive field based on the degree of distortions within an unseen speech signal. The proposed system includes the following components to simultaneously enhance the magnitude and phase responses of speech, which leads to improved perceptual quality: (i) a complex spectrum enhancement module that uses multi-frame filtering technique to implicitly correct the phase response, (ii) a magnitude enhancement module that suppresses dominant reflections and recovers the formant structure using deep filtering (DF) technique, and (iii) a speech activity detection (SAD) estimation module that predicts frame-wise speech activity to suppress residuals in non-speech regions. We assess the performance of the proposed system by employing objective speech quality metrics on both simulated and real speech recordings from the REVERB challenge corpus. The experimental results demonstrate the benefits of using DCNs and multi-frame filtering for speech dereverberation task. We compare the performance of our proposed system against other signal processing (SP) and DL-based systems and observe that it consistently outperforms other approaches across all speech quality metrics.

引用

页码：1712 / 1723

页数：12

共 50 条

[21] Joint waveform and magnitude processing for monaural speech enhancement
Xiang, Xiaoxiao
Zhang, Xiaojuan
APPLIED ACOUSTICS, 2022, 200
[22] An Attention-augmented Fully Convolutional Neural Network for Monaural Speech Enhancement
Xu, Zezheng
Jiang, Ting
Li, Chao
Yu, Jiacheng
2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
[23] DBT-Net: Dual-Branch Federative Magnitude and Phase Estimation With Attention-in-Attention Transformer for Monaural Speech Enhancement
Yu, Guochen
Li, Andong
Wang, Hui
Wang, Yutian
Ke, Yuxuan
Zheng, Chengshi
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2629 - 2644
[24] Bifurcation and Reunion: A Loss-Guided Two-Stage Approach for Monaural Speech Dereverberation
Luo, Xiaoxue
Zheng, Chengshi
Li, Andong
Ke, Yuxuan
Li, Xiaodong
INTERSPEECH 2022, 2022, : 2503 - 2507
[25] SPEECH DEREVERBERATION USING VARIATIONAL AUTOENCODERS
Baby, Deepak
Bourlard, Herve
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5784 - 5788
[26] SkipConvNet: Skip Convolutional Neural Network for Speech Dereverberation using Optimally Smoothed Spectral Mapping
Kothapally, Vinay
Xia, Wei
Ghorbani, Shahram
Hansen, John H. L.
Xue, Wei
Huang, Jing
INTERSPEECH 2020, 2020, : 3935 - 3939
[27] Cascaded Speech Separation Denoising and Dereverberation Using Attention and TCN-WPE Networks for Speech Devices
Zhang, Xuan
Tang, Jun
Cao, Huiliang
Wang, Chenguang
Shen, Chong
Liu, Jun
IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (10): : 18047 - 18058
[28] Monaural Speech Separation with Deep Learning Using Phase Modelling and Capsule Networks
Staines, Toby
Weyde, Tillman
Galkin, Oleksandr
2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
[29] Joint Ideal Ratio Mask and Generative Adversarial Networks for Monaural Speech Enhancement
Yuan, Jing
Bao, Changchun
PROCEEDINGS OF 2018 14TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2018, : 276 - 280
[30] Speech Dereverberation With Context-Aware Recurrent Neural Networks
Santos, Joao Felipe
Falk, Tiago H.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (07) : 1232 - 1242

← 1 2 3 4 5 →