REAL-TIME DENOISING AND DEREVERBERATION WTIH TINY RECURRENT U-NET

被引:44
作者
Choi, Hyeong-Seok [1 ,2 ]
Park, Sungjin [1 ]
Lee, Jie Hwan [2 ]
Heo, Hoon [2 ]
Jeon, Dongsuk [1 ]
Lee, Kyogu [1 ,2 ]
机构
[1] Seoul Natl Univ, Artificial Intelligence Inst, Dept Intelligence & Informat, Seoul, South Korea
[2] Supertone Inc, Canoga Pk, CA 91307 USA
来源
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年
关键词
real-time speech enhancement; lightweight network; denoising; dereverberation;
D O I
10.1109/ICASSP39728.2021.9414852
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Modern deep learning-based models have seen outstanding performance improvement with speech enhancement tasks. The number of parameters of state-of-the-art models, however, is often too large to be deployed on devices for real-world applications. To this end, we propose Tiny Recurrent U-Net (TRU-Net), a lightweight online inference model that matches the performance of current state-of-the-art models. The size of the quantized version of TRU-Net is 362 kilobytes, which is small enough to be deployed on edge devices. In addition, we combine the small-sized model with a new masking method called phase-aware beta-sigmoid mask, which enables simultaneous denoising and dereverberation. Results of both objective and subjective evaluations have shown that our model can achieve competitive performance with the current state-of-the-art models on benchmark datasets using fewer parameters by orders of magnitude.
引用
收藏
页码:5789 / 5793
页数:5
相关论文
共 32 条
[1]  
[Anonymous], 2013, COMPUT REV
[2]  
[Anonymous], 2001, ITU-T. P.862
[3]  
Braun Sebastian, 2020, Speech and Computer. 22nd International Conference, SPECOM 2020. Proceedings. Lecture Notes in Artificial Intelligence Subseries of Lecture Notes in Computer Science (LNAI 12335), P79, DOI 10.1007/978-3-030-60276-5_8
[4]  
Cho K., 2014, ARXIV14061078, DOI [DOI 10.3115/V1/D14-1179, 10.3115/v1/D14-1179]
[5]  
Choi H.-S., 2019, P ICLR
[6]  
Engel J., 2020, ARXIV200104643
[7]   Investigations on Data Augmentation and Loss Functions for Deep Learning Based Speech-Background Separation [J].
Erdogan, Hakan ;
Yoshioka, Takuya .
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, :3499-3503
[8]   TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids [J].
Fedorov, Igor ;
Stamenovic, Marko ;
Jensen, Carl ;
Yang, Li-Chia ;
Mandell, Ari ;
Gan, Yiming ;
Mattina, Matthew ;
Whatmough, Paul N. .
INTERSPEECH 2020, 2020, :4054-4058
[9]  
Grzywalski T, 2019, INT CONF ACOUST SPEE, P6970, DOI [10.1109/icassp.2019.8682830, 10.1109/ICASSP.2019.8682830]
[10]  
Howard A.G., 2017, MOBILENETS EFFICIENT