Real-time Multi-channel Speech Enhancement Based on Neural Network Masking with Attention Model

被引:3
|
作者
Xue, Cheng [1 ]
Huang, Weilong [1 ]
Chen, Weiguang [1 ]
Feng, Jinwei [2 ]
机构
[1] Alibaba Grp, Speech Lab, Hangzhou, Peoples R China
[2] Alibaba Grp, Speech Lab, Sunnyvale, CA USA
来源
关键词
real-time; multi-channel speech enhancement; beamforming; deep neural network;
D O I
10.21437/Interspeech.2021-2266
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
In this paper, we propose a real-time multi-channel speech enhancement method for noise reduction and dereverberation in far-field environments. The proposed method consists of two components: differential beamforming and mask estimation network. The differential beamforming is employed to suppress the interference signals from non-target directions such that a relatively clean speech can be obtained. The mask estimation network with an attention model is developed to capture the signal correlation among different channels in the feature extraction stage and enhance the feature representation that needs to be reconstructed into the target speech in the estimation mask stage. In the inference phase, the spectrum after differential beamforming is filtered by the estimated mask to obtain the final output. The spectrum after differential beamforming can provide a higher signal-to-noise ratio (SNR) than the original spectrum, so the estimated mask can more easily filter out the noise. We conducted experiments on the ConferencingSpeech2021 challenge (INTERSPEECH 2021) dataset to evaluate the proposed method. With only 2.9M parameters, the proposed method achieved competitive performance.
引用
收藏
页码:1862 / 1866
页数:5
相关论文
共 50 条
  • [21] A Feature Integration Network for Multi-Channel Speech Enhancement
    Zeng, Xiao
    Zhang, Xue
    Wang, Mingjiang
    SENSORS, 2024, 24 (22)
  • [22] The multi-channel AR model for real-time audio restoration
    Lin, H
    Godsill, S
    2005 WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2005, : 335 - 338
  • [23] MULTI-SCALE TEMPORAL FREQUENCY CONVOLUTIONAL NETWORK WITH AXIAL ATTENTION FOR MULTI-CHANNEL SPEECH ENHANCEMENT
    Zhang, Guochang
    Wang, Chunliang
    Yu, Libiao
    Wei, Jianqiang
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 9206 - 9210
  • [24] Real-time Multi-channel System for Neural Spikes Acquisition and Detection
    Huang, Li
    Zhang, Xu
    Guan, Ning
    Chen, Sanyuan
    Gui, Yun
    Pei, Weihua
    Chen, Hongda
    2012 IEEE 10TH INTERNATIONAL NEW CIRCUITS AND SYSTEMS CONFERENCE (NEWCAS), 2012, : 149 - 152
  • [25] A Novel Approach to Multi-Channel Speech Enhancement Based on Graph Neural Networks
    Chau, Hoang Ngoc
    Bui, Tien Dat
    Nguyen, Huu Binh
    Duong, Thanh Thi Hien
    Nguyen, Quoc Cuong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1133 - 1144
  • [26] MULTI-CHANNEL SPEECH ENHANCEMENT USING GRAPH NEURAL NETWORKS
    Tzirakis, Panagiotis
    Kumar, Anurag
    Donley, Jacob
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3415 - 3419
  • [27] AN ATTENTION-BASED NEURAL NETWORK APPROACH FOR SINGLE CHANNEL SPEECH ENHANCEMENT
    Hao, Xiang
    Shan, Changhao
    Xu, Yong
    Sun, Sining
    Xie, Lei
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6895 - 6899
  • [28] Underwater Image Enhancement Network Based on Multi-channel Hybrid Attention Mechanism
    Li Y.
    Sun S.
    Huang Q.
    Jing P.
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2024, 46 (01): : 118 - 128
  • [29] A Neural Attention Model for Real-Time Network Intrusion Detection
    Tan, Mengxuan
    Iacovazzi, Alfonso
    Cheung, Ngai-Man
    Elovici, Yuval
    PROCEEDINGS OF THE IEEE LCN: 2019 44TH ANNUAL IEEE CONFERENCE ON LOCAL COMPUTER NETWORKS (LCN 2019), 2019, : 291 - 299
  • [30] A MODULATION-DOMAIN LOSS FOR NEURAL-NETWORK-BASED REAL-TIME SPEECH ENHANCEMENT
    Vuong, Tyler
    Xia, Yangyang
    Stern, Richard M.
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6643 - 6647