Multi-channel Speech Enhancement Based on the MVDR Beamformer and Postfilter

被引:0
|
作者
Wang, Dujuan [1 ]
Bao, Changchun [1 ]
机构
[1] Beijing Univ Technol, Fac Informat Technol, Speech & Audio Signal Proc Lab, Beijing, Peoples R China
来源
2020 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMMUNICATIONS AND COMPUTING (IEEE ICSPCC 2020) | 2020年
基金
中国国家自然科学基金;
关键词
beamforming; speech enhancement; residual neural network; real and imaginary masks; postfilter;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep neural network (DNN) based ideal ratio mask (IRM) estimation methods have yielded good performance in monaural speech enhancement. Meanwhile, these methods have also shown considerable potential for beamforming and multichannel speech enhancement. It is crucial for minimum variance distortionless response (MVDR) beamformer to estimate the covariance matrix of the speech and noise accurately. The accurate estimation of time-frequency (T-F) mask has significant impact on the estimation of the covariance matrices. So, in this paper, a complex real and imaginary ratio mask (CRIRM) based MVDR beamformer for speech enhancement using residual network is proposed. First, the real and imaginary masks of speech and noise are estimated by taking advantage of a residual neural network. After that, the estimations of speech and noise are obtained by using the estimated masks. Finally, the covariance matrices of speech and noise are estimated, and applied into the MVDR beamformer. In addition, in order to further reduce residual noise interference, the output of the MVDR beamformer is further processed by an end-to-end monaural speech enhancement module. Experiments show that, the proposed method can better improve the quality and intelligibility of the enhanced speech.
引用
收藏
页数:5
相关论文
共 50 条
  • [21] All-Neural Multi-Channel Speech Enhancement
    Wang, Zhong-Qiu
    Wang, DeLiang
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3234 - 3238
  • [22] A Joint-Loss Approach for Speech Enhancement via Single-channel Neural Network and MVDR Beamformer
    Tan, Zhi-Wei
    Nguyen, Anh H. T.
    Tran, Linh T. T.
    Khong, Andy W. H.
    2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 841 - 849
  • [23] Generalized postfilter for speech quality enhancement
    Grancharov, Volodya
    Plasberg, Jan H.
    Samuelsson, Jonas
    Kleijn, W. Bastiaan
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (01): : 57 - 64
  • [24] Channel-Time-Frequency Attention Module for Improved Multi-Channel Speech Enhancement
    Zeng, Xiao
    Wang, Mingjiang
    IEEE ACCESS, 2025, 13 : 44418 - 44427
  • [25] Real-time dual-channel speech enhancement by VAD assisted MVDR beamformer for hearing aid applications using smartphone
    Shankar, Nikhil
    Bhat, Gautam Shreedhar
    Panahi, Issa M. S.
    42ND ANNUAL INTERNATIONAL CONFERENCES OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY: ENABLING INNOVATIVE TECHNOLOGIES FOR GLOBAL HEALTHCARE EMBC'20, 2020, : 952 - 955
  • [26] A Fast-Converging Adaptive Frequency-Domain MVDR Beamformer for Speech Enhancement
    Zhao, Shengkui
    Jones, Douglas L.
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1928 - 1931
  • [27] DEEP COMPLEX CONVOLUTIONAL RECURRENT NETWORK FOR MULTI-CHANNEL SPEECH ENHANCEMENT AND DEREVERBERATION
    Gelderblom, Femke B.
    Myrvoll, Tor Andre
    2021 IEEE 31ST INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2021,
  • [28] MULTI-CHANNEL SPEECH ENHANCEMENT USING GRAPH NEURAL NETWORKS
    Tzirakis, Panagiotis
    Kumar, Anurag
    Donley, Jacob
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3415 - 3419
  • [29] A Phase-Based Time-Frequency masking for multi-channel speech enhancement in domestic environments
    Brutti, Alessio
    Tsiami, Antigoni
    Katsamanis, Athanasios
    Maragos, Petros
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2875 - 2879
  • [30] Noise eigenvalue modification methods for spatial subspace based multi-channel speech enhancement
    Kim, Gibak
    Cho, Nam Ik
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 573 - +