Real-time Multi-channel Speech Enhancement Based on Neural Network Masking with Attention Model

被引:3
|
作者
Xue, Cheng [1 ]
Huang, Weilong [1 ]
Chen, Weiguang [1 ]
Feng, Jinwei [2 ]
机构
[1] Alibaba Grp, Speech Lab, Hangzhou, Peoples R China
[2] Alibaba Grp, Speech Lab, Sunnyvale, CA USA
来源
关键词
real-time; multi-channel speech enhancement; beamforming; deep neural network;
D O I
10.21437/Interspeech.2021-2266
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
In this paper, we propose a real-time multi-channel speech enhancement method for noise reduction and dereverberation in far-field environments. The proposed method consists of two components: differential beamforming and mask estimation network. The differential beamforming is employed to suppress the interference signals from non-target directions such that a relatively clean speech can be obtained. The mask estimation network with an attention model is developed to capture the signal correlation among different channels in the feature extraction stage and enhance the feature representation that needs to be reconstructed into the target speech in the estimation mask stage. In the inference phase, the spectrum after differential beamforming is filtered by the estimated mask to obtain the final output. The spectrum after differential beamforming can provide a higher signal-to-noise ratio (SNR) than the original spectrum, so the estimated mask can more easily filter out the noise. We conducted experiments on the ConferencingSpeech2021 challenge (INTERSPEECH 2021) dataset to evaluate the proposed method. With only 2.9M parameters, the proposed method achieved competitive performance.
引用
收藏
页码:1862 / 1866
页数:5
相关论文
共 50 条
  • [1] A Causal U-net based Neural Beamforming Network for Real-Time Multi-Channel Speech Enhancement
    Ren, Xinlei
    Zhang, Xu
    Chen, Lianwu
    Zheng, Xiguang
    Zhang, Chen
    Guo, Liang
    Yu, Bing
    INTERSPEECH 2021, 2021, : 1832 - 1836
  • [2] A Neural Beamspace-Domain Filter for Real-Time Multi-Channel Speech Enhancement
    Liu, Wenzhe
    Li, Andong
    Wang, Xiao
    Yuan, Minmin
    Chen, Yi
    Zheng, Chengshi
    Li, Xiaodong
    SYMMETRY-BASEL, 2022, 14 (06):
  • [3] Time-Frequency Masking Based Online Multi-Channel Speech Enhancement With Convolutional Recurrent Neural Networks
    Chakrabarty, Soumitro
    Habets, Emanuel A. P.
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (04) : 787 - 799
  • [4] COMBINING DEEP NEURAL NETWORKS AND BEAMFORMING FOR REAL-TIME MULTI-CHANNEL SPEECH ENHANCEMENT USING A WIRELESS ACOUSTIC SENSOR NETWORK
    Ceolini, Enea
    Liu, Shih-Chii
    2019 IEEE 29TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2019,
  • [5] TIME-FREQUENCY MASKING BASED ONLINE SPEECH ENHANCEMENT WITH MULTI-CHANNEL DATA USING CONVOLUTIONAL NEURAL NETWORKS
    Chakrabarty, Soumitro
    Wang, DeLiang
    Habets, Emanuel A. P.
    2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 476 - 480
  • [6] Dual channel neural network speech enhancement algorithm based on time frequency masking
    Jia, Hairong
    Mei, Shulin
    Zhang, Min
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2021, 49 (06): : 43 - 49
  • [7] A Complex Neural Network Adaptive Beamforming for Multi-channel Speech Enhancement in Time Domain
    Jiang, Tao
    Liu, Hongqing
    Zhou, Yi
    Gan, Lu
    COMMUNICATIONS AND NETWORKING (CHINACOM 2021), 2022, : 129 - 139
  • [8] Beamforming and lightweight GRU neural network combination model for multi-channel speech enhancement
    Cao, Zhengdong
    Li, Dongmei
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (8-9) : 5677 - 5683
  • [9] Real-Time Speech Enhancement Based on Convolutional Recurrent Neural Network
    Girirajan, S.
    Pandian, A.
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 35 (02): : 1987 - 2001
  • [10] Reference Channel Selection by Multi-Channel Masking for End-to-End Multi-Channel Speech Enhancement
    Dai, Wang
    Li, Xiaofei
    Politis, Archontis
    Virtanen, Tuomas
    32ND EUROPEAN SIGNAL PROCESSING CONFERENCE, EUSIPCO 2024, 2024, : 241 - 245