Multi-Channel Bin-Wise Speech Separation Combining Time-Frequency Masking and Beamforming

被引:0
作者
Bella, Mostafa [1 ,2 ]
Saylani, Hicham [2 ]
Hosseini, Shahram [1 ]
Deville, Yannick [1 ]
机构
[1] Univ Toulouse, CNES, UPS, IRAP,CNRS, F-31400 Toulouse, France
[2] Univ Ibnou Zohr, Fac Sci, MatSim, Agadir 80000, Morocco
关键词
INDEX TERMS Blind Source Separation; Convolutive mixtures; speech separation; sparsity; TF masking; beamforming; BLIND SOURCE SEPARATION; MIXTURES; SIGNALS; ROBUST; MODEL;
D O I
10.1109/ACCESS.2023.3315596
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a novel Blind Source Separation method that can handle convolutive mixtures that may be underdetermined. Our method combines TF masking and beamforming and exploits the source signals sparsity in the Time-Frequency (TF) domain. Remarkable performance can be achieved by TF masking-based methods, even in the underdetermined case, although they tend to generate unwanted artifacts at the level of the separated signals. Besides, beamforming techniques can achieve satisfactory performance only in the overdetermined and determined cases without distorting the estimated signals. By combining these two approaches, we can leverage their respective strengths. Firstly, we exploit the source signals sparsity in the TF domain to estimate probabilistic "bin-wise" masks by modeling the frequency observation vectors with a complex Gaussian Mixture Model and using an EM algorithm. However, due to the sensitivity of the EM algorithm to initialization, we propose properly selecting the initial values of the model parameters using Hermitian angles between the frequency observation vectors and a reference vector. Then, we utilize the estimated TF masks to estimate the Relative Transfer Functions of each source. Finally, we propose a new technique to obtain an estimate of the spatial images of the separated sources, which can be regarded as an underdetermined extension of the Linearly Constrained Minimum Power beamformer. Good performance was observed in test results for our method, both in the determined and underdetermined cases, compared to various existing methods with similar working hypotheses.
引用
收藏
页码:100632 / 100645
页数:14
相关论文
共 50 条
[11]   HIGH-RESOLUTION FREQUENCY-WAVENUMBER SPECTRUM ANALYSIS [J].
CAPON, J .
PROCEEDINGS OF THE IEEE, 1969, 57 (08) :1408-&
[12]  
Cermak J, 2007, INT CONF ACOUST SPEE, P145
[13]  
Dang HTV, 2010, INT CONF ACOUST SPEE, P241, DOI 10.1109/ICASSP.2010.5495994
[14]  
Fan N, 2016, 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP)
[15]   Application of underdetermined blind source separation in ultra-wideband communication signals [J].
Guo, H. (chinamengh823@126.com), 1600, Beijing University of Posts and Telecommunications (20) :13-19
[16]   DNN-Based Mask Estimation for Distributed Speech Enhancement in Spatially Unconstrained Microphone Arrays [J].
Furnon, Nicolas ;
Serizel, Romain ;
Essid, Slim ;
Illina, Irina .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 :2310-2323
[17]  
Heymann J, 2016, INT CONF ACOUST SPEE, P196, DOI 10.1109/ICASSP.2016.7471664
[18]   Online MVDR Beamformer Based on Complex Gaussian Mixture Model With Spatial Prior for Noise Robust ASR [J].
Higuchi, Takuya ;
Ito, Nobutaka ;
Araki, Shoko ;
Yoshioka, Takuya ;
Delcroix, Marc ;
Nakatani, Tomohiro .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (04) :780-793
[19]   Separation of galaxy spectra measured with slitless spectroscopy [J].
Hosseini, Shahram ;
Selloum, Ahmed ;
Contini, Thierry ;
Deville, Yannick .
DIGITAL SIGNAL PROCESSING, 2020, 106
[20]   Blind separation of linear instantaneous mixtures of non-stationary signals in the frequency domain [J].
Hosseini, Shahram ;
Deville, Yannick ;
Saylani, Hicham .
SIGNAL PROCESSING, 2009, 89 (05) :819-830