Multi-channel Speech Enhancement Based on the MVDR Beamformer and Postfilter

被引：0

作者：

Wang, Dujuan ^{[1
]}

Bao, Changchun ^{[1
]}

机构：

[1] Beijing Univ Technol, Fac Informat Technol, Speech & Audio Signal Proc Lab, Beijing, Peoples R China

来源：

2020 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMMUNICATIONS AND COMPUTING (IEEE ICSPCC 2020) | 2020年

基金：

中国国家自然科学基金;

关键词：

beamforming; speech enhancement; residual neural network; real and imaginary masks; postfilter;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep neural network (DNN) based ideal ratio mask (IRM) estimation methods have yielded good performance in monaural speech enhancement. Meanwhile, these methods have also shown considerable potential for beamforming and multichannel speech enhancement. It is crucial for minimum variance distortionless response (MVDR) beamformer to estimate the covariance matrix of the speech and noise accurately. The accurate estimation of time-frequency (T-F) mask has significant impact on the estimation of the covariance matrices. So, in this paper, a complex real and imaginary ratio mask (CRIRM) based MVDR beamformer for speech enhancement using residual network is proposed. First, the real and imaginary masks of speech and noise are estimated by taking advantage of a residual neural network. After that, the estimations of speech and noise are obtained by using the estimated masks. Finally, the covariance matrices of speech and noise are estimated, and applied into the MVDR beamformer. In addition, in order to further reduce residual noise interference, the output of the MVDR beamformer is further processed by an end-to-end monaural speech enhancement module. Experiments show that, the proposed method can better improve the quality and intelligibility of the enhanced speech.

引用

页数：5

共 50 条

[31] MULTI-CHANNEL SPEAKER VERIFICATION WITH CONV-TASNET BASED BEAMFORMER
Mosner, Ladislav
Plchot, Oldrich
Burget, Lukas
Cernocky, Jan ''Honza''
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7982 - 7986
[32] Two-stage UNet with channel and temporal-frequency attention for multi-channel speech enhancement
Xu, Shiyun
Cao, Yinghan
Zhang, Zehua
Wang, Mingjiang
SPEECH COMMUNICATION, 2025, 166
[33] Beamforming and lightweight GRU neural network combination model for multi-channel speech enhancement
Cao, Zhengdong
Li, Dongmei
SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (8-9) : 5677 - 5683
[34] Complex Spectral Mapping for Single- and Multi-Channel Speech Enhancement and Robust ASR
Wang, Zhong-Qiu
Wang, Peidong
Wang, DeLiang
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1778 - 1787
[35] INCORPORATING MULTI-CHANNEL WIENER FILTER WITH SINGLE-CHANNEL SPEECH ENHANCEMENT ALGORITHM
Yong, Pei Chee
Nordholm, Sven
Dam, Hai Huyen
Leung, Yee Hong
Lai, Chiong Ching
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7284 - 7288
[36] A Multi-Channel Noise Estimator Based on Improved Minima Controlled Recursive Averaging for Speech Enhancement
Tangsangiumvisai, Nisachon
ENGINEERING JOURNAL-THAILAND, 2023, 27 (11): : 99 - 112
[37] ADL-MVDR: ALL DEEP LEARNING MVDR BEAMFORMER FOR TARGET SPEECH SEPARATION
Zhang, Zhuohuang
Xu, Yong
Yu, Meng
Zhang, Shi-Xiong
Chen, Lianwu
Yu, Dong
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6089 - 6093
[38] Postfilter for Dual Channel Speech Enhancement Using Coherence and Statistical Model-Based Noise Estimation
Cheong, Sein
Kim, Minseung
Shin, Jong Won
SENSORS, 2024, 24 (12)
[39] Speech Enhancement Using Improved Generalized Sidelobe Canceller in Frequency Domain with Multi-channel Postfiltering
Li, Kai
Fu, Qiang
Yan, Yonghong
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 973 - 976
[40] Multi-Channel Speech Enhancement and Amplitude Modulation Analysis for Noise Robust Automatic Speech Recognition
Moritz, Niko
Adiloglu, Kamil
Anemueller, Joern
Goetze, Stefan
Kollmeier, Birger
COMPUTER SPEECH AND LANGUAGE, 2017, 46 : 558 - 573

← 1 2 3 4 5 →