TCNN: TEMPORAL CONVOLUTIONAL NEURAL NETWORK FOR REAL-TIME SPEECH ENHANCEMENT IN THE TIME DOMAIN

被引:0
作者
Pandey, Ashutosh [1 ]
Wang, DeLiang [1 ,2 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA
来源
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年
关键词
noise-independent and speaker-independent speech enhancement; real-time implementation; time domain; temporal convolutional neural network; TCNN; NOISE;
D O I
10.1109/icassp.2019.8683634
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This work proposes a fully convolutional neural network (CNN) for real-time speech enhancement in the time domain. The proposed CNN is an encoder-decoder based architecture with an additional temporal convolutional module (TCM) inserted between the encoder and the decoder. We call this architecture a Temporal Convolutional Neural Network (TCNN). The encoder in the TCNN creates a low dimensional representation of a noisy input frame. The TCM uses causal and dilated convolutional layers to utilize the encoder output of the current and previous frames. The decoder uses the TCM output to reconstruct the enhanced frame. The proposed model is trained in a speaker-and noise-independent way. Experimental results demonstrate that the proposed model gives consistently better enhancement results than a state-of-the-art real-time convolutional recurrent model. Moreover, since the model is fully convolutional, it has much fewer trainable parameters than earlier models.
引用
收藏
页码:6875 / 6879
页数:5
相关论文
共 50 条
[41]   Two-Stage Refinement of Magnitude and Complex Spectra for Real-Time Speech Enhancement [J].
Lee, Jinyoung ;
Kang, Hong-Goo .
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 :2188-2192
[42]   Time-Domain Speech Enhancement for Robust Automatic Speech Recognition [J].
Yang, Yufeng ;
Pandey, Ashutosh ;
Wang, DeLiang .
INTERSPEECH 2023, 2023, :4913-4917
[43]   Lightweight Causal Transformer with Local Self-Attention for Real-Time Speech Enhancement [J].
Oostermeijer, Koen ;
Wang, Qing ;
Du, Jun .
INTERSPEECH 2021, 2021, :2831-2835
[44]   Speech Enhancement with Stochastic Temporal Convolutional Networks [J].
Richter, Julius ;
Carbajal, Guillaume ;
Gerkmann, Timo .
INTERSPEECH 2020, 2020, :4516-4520
[45]   Real-time Single-channel Dereverberation and Separation with Time-domain Audio Separation Network [J].
Luo, Yi ;
Mesgarani, Nima .
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, :342-346
[46]   Time-Frequency Mask-based Speech Enhancement using Convolutional Generative Adversarial Network [J].
Shah, Neil ;
Patil, Hemant A. ;
Soni, Meet H. .
2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, :1246-1251
[47]   Improving Speech Enhancement in Unseen Noise Using Deep Convolutional Neural Network [J].
Yuan W.-H. ;
Sun W.-Z. ;
Xia B. ;
Ou S.-F. .
Zidonghua Xuebao/Acta Automatica Sinica, 2018, 44 (04) :751-759
[48]   Convolutional Neural Network-based Speech Enhancement for Cochlear Implant Recipients [J].
Mamun, Nursadul ;
Khorram, Soheil ;
Hansen, John H. L. .
INTERSPEECH 2019, 2019, :4265-4269
[49]   A Convolutional Neural Network With Time-Aware Channel Weighting for Temporal Knowledge Graph Completion [J].
Zhang, Kesheng ;
Ouyang, Guige ;
Huang, Yongzhong .
IEEE ACCESS, 2025, 13 :91384-91391
[50]   MAMGAN: Multiscale attention metric GAN for monaural speech enhancement in the time domain [J].
Guo, Huimin ;
Jian, Haifang ;
Wang, Yequan ;
Wang, Hongchang ;
Zhao, Xiaofan ;
Zhu, Wenqi ;
Cheng, Qinghua .
APPLIED ACOUSTICS, 2023, 209