SPATIAL-DCCRN: DCCRN EQUIPPED WITH FRAME-LEVEL ANGLE FEATURE AND HYBRID FILTERING FOR MULTI-CHANNEL SPEECH ENHANCEMENT

被引:5
|
作者
Lv, Shubo [1 ,2 ]
Fu, Yihui [1 ]
Jv, Yukai [1 ,2 ]
Xie, Lei [1 ]
Zhu, Weixin [2 ]
Rao, Wei [2 ]
Wang, Yannan [2 ]
机构
[1] Northwestern Polytech Univ, Audio Speech & Language Proc Grp ASLP NPU, Xian, Peoples R China
[2] Tencent Corp, Tencent Ethereal Audio Lab, Shenzhen, Peoples R China
来源
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT | 2022年
关键词
multi-channel; Spatial-DCCRN; speech enhancement;
D O I
10.1109/SLT54892.2023.10022488
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, multi-channel speech enhancement has drawn much interest due to the use of spatial information to distinguish target speech from interfering signal. To make full use of spatial information and neural network based masking estimation, we propose a multi-channel denoising neural network - Spatial DCCRN. Firstly, we extend S-DCCRN to multi-channel scenario, aiming at performing cascaded sub-channel and full-channel processing strategy, which can model different channels separately. Moreover, instead of only adopting multi-channel spectrum or concatenating first-channel's magnitude and IPD as the model's inputs, we apply an angle feature extraction module (AFE) to extract frame-level angle feature embeddings, which can help the model to apparently perceive spatial information. Finally, since the phenomenon of residual noise will be more serious when the noise and speech exist in the same time frequency (TF) bin, we particularly design a masking and mapping filtering method to substitute the traditional filter-and-sum operation, with the purpose of cascading coarsely denoising, dereverberation and residual noise suppression. The proposed model, Spatial-DCCRN, has surpassed EaBNet, FasNet as well as several competitive models on the L3DAS22 Challenge dataset. Not only the 3D scenario, Spatial-DCCRN outperforms state-of-the-art (SOTA) model MIMO-UNet by a large margin in multiple evaluation metrics on the multi-channel ConferencingSpeech2021 Challenge dataset. Ablation studies also demonstrate the effectiveness of different contributions.
引用
收藏
页码:436 / 443
页数:8
相关论文
共 6 条
  • [1] Two-Stage Single-Channel Speech Enhancement with Multi-Frame Filtering
    Lin, Shaoxiong
    Zhang, Wangyou
    Qian, Yanmin
    APPLIED SCIENCES-BASEL, 2023, 13 (08):
  • [2] Noise eigenvalue modification methods for spatial subspace based multi-channel speech enhancement
    Kim, Gibak
    Cho, Nam Ik
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 573 - +
  • [3] On the Role of Spatial, Spectral, and Temporal Processing for DNN-based Non-linear Multi-channel Speech Enhancement
    Tesch, Kristina
    Mohrmann, Nils-Hendrik
    Gerkmann, Timo
    INTERSPEECH 2022, 2022, : 2908 - 2912
  • [4] Speech Enhancement Using Multi-channel Post-Filtering with Modified Signal Presence Probability in Reverberant Environment
    WANG Xiaofei
    GUO Yanmeng
    FU Qiang
    YAN Yonghong
    ChineseJournalofElectronics, 2016, 25 (03) : 512 - 519
  • [5] Speech Enhancement Using Multi-channel Post-Filtering with Modified Signal Presence Probability in Reverberant Environment
    Wang Xiaofei
    Guo Yanmeng
    Fu Qiang
    Yan Yonghong
    CHINESE JOURNAL OF ELECTRONICS, 2016, 25 (03) : 512 - 519
  • [6] Speech Enhancement Using Robust Generalized Side lobe Canceller with Multi-Channel Post-Filtering in Adverse Environments
    Li Kai
    Fu Qiang
    Yan Yonghong
    CHINESE JOURNAL OF ELECTRONICS, 2012, 21 (01): : 85 - 90