Know Your Enemy, Know Yourself: A Unified Two-Stage Framework for Speech Enhancement

被引:13
作者
Liu, Wenzhe [1 ,2 ]
Li, Andong [1 ,2 ]
Ke, Yuxuan [1 ,2 ]
Zheng, Chengshi [1 ,2 ]
Li, Xiaodong [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Acoust, Key Lab Noise & Vibrat Res, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
来源
INTERSPEECH 2021 | 2021年
关键词
speech enhancement; dereverberation; unified framework; noise awareness; deep neural networks; NOISE; DEREVERBERATION;
D O I
10.21437/Interspeech.2021-238
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Traditional spectral subtraction-type single channel speech enhancement (SE) algorithms often need to estimate interference components including noise and/or reverberation before subtracting them while deep neural network-based SE methods often aim to realize the end-to-end target mapping. In this paper, we show that both denoising and dereverberation can be unified into a common problem by introducing a two-stage paradigm, namely for interference components estimation and speech recovery. In the first stage, we propose to explicitly extract the magnitude of interference components, which serves as the prior information. In the second stage, with the guidance of this estimated magnitude prior, we can expect to better recover the target speech. In addition, we propose a transform module to facilitate the interaction between interference components and the desired speech modalities. Meanwhile, a temporal fusion module is designed to model long-term dependencies without ignoring short-term details. We conduct the experiments on the WSJ0-SI84 corpus and the results on both denoising and dereverberation tasks show that our approach outperforms previous advanced systems and achieves state-of-the-art performance in terms of many objective metrics.
引用
收藏
页码:186 / 190
页数:5
相关论文
共 37 条
  • [1] IMAGE METHOD FOR EFFICIENTLY SIMULATING SMALL-ROOM ACOUSTICS
    ALLEN, JB
    BERKLEY, DA
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 (04) : 943 - 950
  • [2] Barker J, 2015, 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P504, DOI 10.1109/ASRU.2015.7404837
  • [3] Bengio Y., 2009, P 26 ANN INT C MACH, DOI DOI 10.1145/1553374.15533802,5
  • [4] Long short-term memory for speaker generalization in supervised speech separation
    Chen, Jitong
    Wang, DeLiang
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 141 (06) : 4705 - 4714
  • [5] Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging
    Cohen, I
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (05): : 466 - 475
  • [6] Speech enhancement for non-stationary noise environments
    Cohen, I
    Berdugo, B
    [J]. SIGNAL PROCESSING, 2001, 81 (11) : 2403 - 2418
  • [7] Dauphin YN, 2017, PR MACH LEARN RES, V70
  • [8] SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR LOG-SPECTRAL AMPLITUDE ESTIMATOR
    EPHRAIM, Y
    MALAH, D
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1985, 33 (02): : 443 - 445
  • [9] SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR SHORT-TIME SPECTRAL AMPLITUDE ESTIMATOR
    EPHRAIM, Y
    MALAH, D
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (06): : 1109 - 1121
  • [10] Erdogan H, 2015, INT CONF ACOUST SPEE, P708, DOI 10.1109/ICASSP.2015.7178061