REAL-TIME DENOISING AND DEREVERBERATION WTIH TINY RECURRENT U-NET

被引:38
作者
Choi, Hyeong-Seok [1 ,2 ]
Park, Sungjin [1 ]
Lee, Jie Hwan [2 ]
Heo, Hoon [2 ]
Jeon, Dongsuk [1 ]
Lee, Kyogu [1 ,2 ]
机构
[1] Seoul Natl Univ, Artificial Intelligence Inst, Dept Intelligence & Informat, Seoul, South Korea
[2] Supertone Inc, Canoga Pk, CA 91307 USA
来源
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年
关键词
real-time speech enhancement; lightweight network; denoising; dereverberation;
D O I
10.1109/ICASSP39728.2021.9414852
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Modern deep learning-based models have seen outstanding performance improvement with speech enhancement tasks. The number of parameters of state-of-the-art models, however, is often too large to be deployed on devices for real-world applications. To this end, we propose Tiny Recurrent U-Net (TRU-Net), a lightweight online inference model that matches the performance of current state-of-the-art models. The size of the quantized version of TRU-Net is 362 kilobytes, which is small enough to be deployed on edge devices. In addition, we combine the small-sized model with a new masking method called phase-aware beta-sigmoid mask, which enables simultaneous denoising and dereverberation. Results of both objective and subjective evaluations have shown that our model can achieve competitive performance with the current state-of-the-art models on benchmark datasets using fewer parameters by orders of magnitude.
引用
收藏
页码:5789 / 5793
页数:5
相关论文
共 32 条
  • [1] [Anonymous], 2013, COMPUT REV
  • [2] Braun Sebastian, 2020, Speech and Computer. 22nd International Conference, SPECOM 2020. Proceedings. Lecture Notes in Artificial Intelligence Subseries of Lecture Notes in Computer Science (LNAI 12335), P79, DOI 10.1007/978-3-030-60276-5_8
  • [3] Cho K, 2014, ARXIV14061078, P1724, DOI DOI 10.3115/V1/D14-1179
  • [4] Choi Hyeong-Seok, 2019, P ICLR
  • [5] Investigations on Data Augmentation and Loss Functions for Deep Learning Based Speech-Background Separation
    Erdogan, Hakan
    Yoshioka, Takuya
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3499 - 3503
  • [6] TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids
    Fedorov, Igor
    Stamenovic, Marko
    Jensen, Carl
    Yang, Li-Chia
    Mandell, Ari
    Gan, Yiming
    Mattina, Matthew
    Whatmough, Paul N.
    [J]. INTERSPEECH 2020, 2020, : 4054 - 4058
  • [7] Grzywalski T, 2019, INT CONF ACOUST SPEE, P6970, DOI 10.1109/ICASSP.2019.8682830
  • [8] Hantrakul L., 2020, ICLR
  • [9] Howard A.G., 2017, 170404861 ARXIV, V1704, P4861
  • [10] DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement
    Hu, Yanxin
    Liu, Yun
    Lv, Shubo
    Xing, Mengtao
    Zhang, Shimin
    Fu, Yihui
    Wu, Jian
    Zhang, Bihong
    Xie, Lei
    [J]. INTERSPEECH 2020, 2020, : 2472 - 2476