TCNN: TEMPORAL CONVOLUTIONAL NEURAL NETWORK FOR REAL-TIME SPEECH ENHANCEMENT IN THE TIME DOMAIN

被引:0
作者
Pandey, Ashutosh [1 ]
Wang, DeLiang [1 ,2 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA
来源
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年
关键词
noise-independent and speaker-independent speech enhancement; real-time implementation; time domain; temporal convolutional neural network; TCNN; NOISE;
D O I
10.1109/icassp.2019.8683634
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This work proposes a fully convolutional neural network (CNN) for real-time speech enhancement in the time domain. The proposed CNN is an encoder-decoder based architecture with an additional temporal convolutional module (TCM) inserted between the encoder and the decoder. We call this architecture a Temporal Convolutional Neural Network (TCNN). The encoder in the TCNN creates a low dimensional representation of a noisy input frame. The TCM uses causal and dilated convolutional layers to utilize the encoder output of the current and previous frames. The decoder uses the TCM output to reconstruct the enhanced frame. The proposed model is trained in a speaker-and noise-independent way. Experimental results demonstrate that the proposed model gives consistently better enhancement results than a state-of-the-art real-time convolutional recurrent model. Moreover, since the model is fully convolutional, it has much fewer trainable parameters than earlier models.
引用
收藏
页码:6875 / 6879
页数:5
相关论文
共 50 条
[31]   Single channel speech enhancement using convolutional neural network [J].
Kounovsky, Tomas ;
Malek, Jiri .
2017 IEEE INTERNATIONAL WORKSHOP OF ELECTRONICS, CONTROL, MEASUREMENT, SIGNALS AND THEIR APPLICATION TO MECHATRONICS (ECMSM), 2017,
[32]   The Recognition of Whispered Speech in Real-Time [J].
Hendrickson, Kristi ;
Ernest, Danielle .
EAR AND HEARING, 2022, 43 (02) :554-562
[33]   A Dual-branch Convolutional Network Architecture Processing on both Frequency and Time Domain for Single-channel Speech Enhancement [J].
Zhang, Kanghao ;
He, Shulin ;
Li, Hao ;
Zhang, Xueliang .
APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2023, 12 (03)
[34]   Deep Time Delay Neural Network for Speech Enhancement with Full Data Learning [J].
Fan, Cunhang ;
Liu, Bin ;
Tao, Jianhua ;
Yi, Jiangyan ;
Wen, Zhengqi ;
Song, Leichao .
2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
[35]   SNR-Aware Convolutional Neural Network Modeling for Speech Enhancement [J].
Fu, Szu-Wei ;
Tsao, Yu ;
Lu, Xugang .
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :3768-3772
[36]   On Loss Functions for Supervised Monaural Time-Domain Speech Enhancement [J].
Kolbaek, Morten ;
Tan, Zheng-Hua ;
Jensen, Soren Holdt ;
Jensen, Jesper .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 :825-838
[37]   Single Channel Speech Enhancement Using Temporal Convolutional Recurrent Neural Networks [J].
Li, Jingdong ;
Zhang, Hui ;
Zhang, Xueliang ;
Li, Changliang .
2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, :896-900
[38]   Harmonic beamformers for speech enhancement and dereverberation in the time domain [J].
Jensen, J. R. ;
Karimian-Azari, S. ;
Christensen, M. G. ;
Benesty, J. .
SPEECH COMMUNICATION, 2020, 116 :1-11
[39]   Visually Assisted Time-Domain Speech Enhancement [J].
Ideli, Elham ;
Sharpe, Bruce ;
Bajic, Ivan, V ;
Vaughan, Rodney G. .
2019 7TH IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (IEEE GLOBALSIP), 2019,
[40]   Optimization and evaluation of sigmoid function with a priori SNR estimate for real-time speech enhancement [J].
Yong, Pei Chee ;
Nordholm, Sven ;
Dam, Hai Huyen .
SPEECH COMMUNICATION, 2013, 55 (02) :358-376