Beamforming-based Speech Enhancement based on Optimal Ratio Mask

被引:0
作者
Ji, Qiang [1 ]
Bao, Changchun [1 ]
Cheng, Rui [1 ]
机构
[1] Beijing Univ Technol, Fac Informat Technol, Speech & Audio Signal Proc Lab, Beijing 100124, Peoples R China
来源
CONFERENCE PROCEEDINGS OF 2019 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMMUNICATIONS AND COMPUTING (IEEE ICSPCC 2019) | 2019年
基金
中国国家自然科学基金;
关键词
Speech enhancement; beamforming; time-frequency mask; neural networks; MULTICHANNEL WIENER FILTER;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech enhancement in the noisy and reverberant environment remains a challenging task. Acoustic beamforming algorithm with minimum variance distortionless response (MVDR) has shown to be effective for this case. The crucial issue in MVDR-based speech enhancement is to get accurate estimates of the speech and noise spatial covariance matrices (SCMs). On this way, time-frequency mask-based method which is a reliable method to estimate the SCMs can improve the performance of the MVDR beamformer in speech enhancement. In this paper, an optimal ratio mask-based method used for MVDR beamforming is proposed. Specifically, the convolutional neural networks (CNNs) is used in the proposed method, which operates on the magnitude and phase components of the short-time Fourier transform (STFT) of microphones to estimate the optimal ratio masks, and these masks are used to get the SCMs for constructing MVDR beamformer. Experiments are conducted by using simulated data. The results show that the proposed method is more robust than the reference methods against the terrible acoustic conditions.
引用
收藏
页数:5
相关论文
共 25 条
[1]   IMAGE METHOD FOR EFFICIENTLY SIMULATING SMALL-ROOM ACOUSTICS [J].
ALLEN, JB ;
BERKLEY, DA .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 (04) :943-950
[2]  
[Anonymous], 2016, CHIME 4 WORKSH
[3]  
Barker J., 2018, COMPUT SPEECH LANG, V27, P621
[4]  
Benesty J, 2008, J Acoust Soc Am, V125, P4097
[5]   On the importance of early reflections for speech in rooms [J].
Bradley, JS ;
Sato, H ;
Picard, M .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2003, 113 (06) :3233-3244
[6]  
Chakrabarty S, 2018, INT WORKSH ACOUSTIC, P476, DOI 10.1109/IWAENC.2018.8521346
[7]  
DiBiase JH, 2001, DIGITAL SIGNAL PROC, P157
[8]   A CURRENT DISTRIBUTION FOR BROADSIDE ARRAYS WHICH OPTIMIZES THE RELATIONSHIP BETWEEN BEAM WIDTH AND SIDE-LOBE LEVEL [J].
DOLPH, CL .
PROCEEDINGS OF THE INSTITUTE OF RADIO ENGINEERS, 1946, 34 (06) :335-348
[9]  
E. A. P. Habets, 2016, ROOM IMPULSE RESPONS
[10]   A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation [J].
Gannot, Sharon ;
Vincent, Emmanuel ;
Markovich-Golan, Shmulik ;
Ozerov, Alexey .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (04) :692-730