Real-time Multi-channel Speech Enhancement Based on Neural Network Masking with Attention Model

被引:3
|
作者
Xue, Cheng [1 ]
Huang, Weilong [1 ]
Chen, Weiguang [1 ]
Feng, Jinwei [2 ]
机构
[1] Alibaba Grp, Speech Lab, Hangzhou, Peoples R China
[2] Alibaba Grp, Speech Lab, Sunnyvale, CA USA
来源
INTERSPEECH 2021 | 2021年
关键词
real-time; multi-channel speech enhancement; beamforming; deep neural network;
D O I
10.21437/Interspeech.2021-2266
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
In this paper, we propose a real-time multi-channel speech enhancement method for noise reduction and dereverberation in far-field environments. The proposed method consists of two components: differential beamforming and mask estimation network. The differential beamforming is employed to suppress the interference signals from non-target directions such that a relatively clean speech can be obtained. The mask estimation network with an attention model is developed to capture the signal correlation among different channels in the feature extraction stage and enhance the feature representation that needs to be reconstructed into the target speech in the estimation mask stage. In the inference phase, the spectrum after differential beamforming is filtered by the estimated mask to obtain the final output. The spectrum after differential beamforming can provide a higher signal-to-noise ratio (SNR) than the original spectrum, so the estimated mask can more easily filter out the noise. We conducted experiments on the ConferencingSpeech2021 challenge (INTERSPEECH 2021) dataset to evaluate the proposed method. With only 2.9M parameters, the proposed method achieved competitive performance.
引用
收藏
页码:1862 / 1866
页数:5
相关论文
共 50 条
  • [1] A Causal U-net based Neural Beamforming Network for Real-Time Multi-Channel Speech Enhancement
    Ren, Xinlei
    Zhang, Xu
    Chen, Lianwu
    Zheng, Xiguang
    Zhang, Chen
    Guo, Liang
    Yu, Bing
    INTERSPEECH 2021, 2021, : 1832 - 1836
  • [2] A Neural Beamspace-Domain Filter for Real-Time Multi-Channel Speech Enhancement
    Liu, Wenzhe
    Li, Andong
    Wang, Xiao
    Yuan, Minmin
    Chen, Yi
    Zheng, Chengshi
    Li, Xiaodong
    SYMMETRY-BASEL, 2022, 14 (06):
  • [3] COMBINING DEEP NEURAL NETWORKS AND BEAMFORMING FOR REAL-TIME MULTI-CHANNEL SPEECH ENHANCEMENT USING A WIRELESS ACOUSTIC SENSOR NETWORK
    Ceolini, Enea
    Liu, Shih-Chii
    2019 IEEE 29TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2019,
  • [4] Channel-Time-Frequency Attention Module for Improved Multi-Channel Speech Enhancement
    Zeng, Xiao
    Wang, Mingjiang
    IEEE ACCESS, 2025, 13 : 44418 - 44427
  • [5] Beamforming and lightweight GRU neural network combination model for multi-channel speech enhancement
    Cao, Zhengdong
    Li, Dongmei
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (8-9) : 5677 - 5683
  • [6] All-Neural Multi-Channel Speech Enhancement
    Wang, Zhong-Qiu
    Wang, DeLiang
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3234 - 3238
  • [7] A time-frequency fusion model for multi-channel speech enhancement
    Zeng, Xiao
    Xu, Shiyun
    Wang, Mingjiang
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01):
  • [8] Real-Time Speech Enhancement Algorithm Based on Attention LSTM
    Liang, Ruiyu
    Kong, Fanliu
    Xie, Yue
    Tang, Guichen
    Cheng, Jiaming
    IEEE ACCESS, 2020, 8 : 48464 - 48476
  • [9] Real-time single-channel deep neural network-based speech enhancement on edge devices
    Shankar, Nikhil
    Bhat, Gautam Shreedhar
    Panahi, Issa M. S.
    INTERSPEECH 2020, 2020, : 3281 - 3285
  • [10] A Feature Integration Network for Multi-Channel Speech Enhancement
    Zeng, Xiao
    Zhang, Xue
    Wang, Mingjiang
    SENSORS, 2024, 24 (22)