REAL-TIME DENOISING AND DEREVERBERATION WTIH TINY RECURRENT U-NET

被引：44

作者：

Choi, Hyeong-Seok ^{[1
,2
]}

Park, Sungjin ^{[1
]}

Lee, Jie Hwan ^{[2
]}

Heo, Hoon ^{[2
]}

Jeon, Dongsuk ^{[1
]}

Lee, Kyogu ^{[1
,2
]}

机构：

[1] Seoul Natl Univ, Artificial Intelligence Inst, Dept Intelligence & Informat, Seoul, South Korea

[2] Supertone Inc, Canoga Pk, CA 91307 USA

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

关键词：

real-time speech enhancement; lightweight network; denoising; dereverberation;

D O I：

10.1109/ICASSP39728.2021.9414852

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Modern deep learning-based models have seen outstanding performance improvement with speech enhancement tasks. The number of parameters of state-of-the-art models, however, is often too large to be deployed on devices for real-world applications. To this end, we propose Tiny Recurrent U-Net (TRU-Net), a lightweight online inference model that matches the performance of current state-of-the-art models. The size of the quantized version of TRU-Net is 362 kilobytes, which is small enough to be deployed on edge devices. In addition, we combine the small-sized model with a new masking method called phase-aware beta-sigmoid mask, which enables simultaneous denoising and dereverberation. Results of both objective and subjective evaluations have shown that our model can achieve competitive performance with the current state-of-the-art models on benchmark datasets using fewer parameters by orders of magnitude.

引用

页码：5789 / 5793

页数：5

共 32 条

[1]

[Anonymous], 2013, COMPUT REV

[2]

[Anonymous], 2001, ITU-T. P.862

[3]

Braun Sebastian, 2020, Speech and Computer. 22nd International Conference, SPECOM 2020. Proceedings. Lecture Notes in Artificial Intelligence Subseries of Lecture Notes in Computer Science (LNAI 12335), P79, DOI 10.1007/978-3-030-60276-5_8

[4]

Cho K., 2014, ARXIV14061078, DOI [DOI 10.3115/V1/D14-1179, 10.3115/v1/D14-1179]

[5]

Choi H.-S., 2019, P ICLR

[6]

Engel J., 2020, ARXIV200104643

[7] Investigations on Data Augmentation and Loss Functions for Deep Learning Based Speech-Background Separation [J].

Erdogan, Hakan ;

Yoshioka, Takuya .

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, :3499-3503

[8] TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids [J].

Fedorov, Igor ;

Stamenovic, Marko ;

Jensen, Carl ;

Yang, Li-Chia ;

Mandell, Ari ;

Gan, Yiming ;

Mattina, Matthew ;

Whatmough, Paul N. .

INTERSPEECH 2020, 2020, :4054-4058

[9]

Grzywalski T, 2019, INT CONF ACOUST SPEE, P6970, DOI [10.1109/icassp.2019.8682830, 10.1109/ICASSP.2019.8682830]

[10]

Howard A.G., 2017, MOBILENETS EFFICIENT

← 1 2 3 4 →