Multi-Channel Bin-Wise Speech Separation Combining Time-Frequency Masking and Beamforming

被引:0
作者
Bella, Mostafa [1 ,2 ]
Saylani, Hicham [2 ]
Hosseini, Shahram [1 ]
Deville, Yannick [1 ]
机构
[1] Univ Toulouse, CNES, UPS, IRAP,CNRS, F-31400 Toulouse, France
[2] Univ Ibnou Zohr, Fac Sci, MatSim, Agadir 80000, Morocco
关键词
INDEX TERMS Blind Source Separation; Convolutive mixtures; speech separation; sparsity; TF masking; beamforming; BLIND SOURCE SEPARATION; MIXTURES; SIGNALS; ROBUST; MODEL;
D O I
10.1109/ACCESS.2023.3315596
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a novel Blind Source Separation method that can handle convolutive mixtures that may be underdetermined. Our method combines TF masking and beamforming and exploits the source signals sparsity in the Time-Frequency (TF) domain. Remarkable performance can be achieved by TF masking-based methods, even in the underdetermined case, although they tend to generate unwanted artifacts at the level of the separated signals. Besides, beamforming techniques can achieve satisfactory performance only in the overdetermined and determined cases without distorting the estimated signals. By combining these two approaches, we can leverage their respective strengths. Firstly, we exploit the source signals sparsity in the TF domain to estimate probabilistic "bin-wise" masks by modeling the frequency observation vectors with a complex Gaussian Mixture Model and using an EM algorithm. However, due to the sensitivity of the EM algorithm to initialization, we propose properly selecting the initial values of the model parameters using Hermitian angles between the frequency observation vectors and a reference vector. Then, we utilize the estimated TF masks to estimate the Relative Transfer Functions of each source. Finally, we propose a new technique to obtain an estimate of the spatial images of the separated sources, which can be regarded as an underdetermined extension of the Linearly Constrained Minimum Power beamformer. Good performance was observed in test results for our method, both in the determined and underdetermined cases, compared to various existing methods with similar working hypotheses.
引用
收藏
页码:100632 / 100645
页数:14
相关论文
共 50 条
[1]   A time-frequency blind signal separation method applicable to underdetermined mixtures of dependent sources [J].
Abrard, F ;
Deville, Y .
SIGNAL PROCESSING, 2005, 85 (07) :1389-1403
[2]   Joint Mixing Vector and Binaural Model Based Stereo Source Separation [J].
Alinaghi, Atiyeh ;
Jackson, Philip Jb ;
Liu, Qingju ;
Wang, Wenwu .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (09) :1434-1448
[3]  
[Anonymous], 2011, Sisec2011
[4]  
[Anonymous], 1981, PATTERN RECOGN, DOI 10.1007/978-1-4757-0450-1_3
[5]   Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors [J].
Araki, Shoko ;
Sawada, Hiroshi ;
Mukai, Ryo ;
Makino, Shoji .
SIGNAL PROCESSING, 2007, 87 (08) :1833-1847
[6]  
Arberet S., 2010, 2010 10th International Conference on Information Sciences, Signal Processing and their Applications (ISSPA 2010), P1, DOI 10.1109/ISSPA.2010.5605570
[7]   A Robust Method to Count and Locate Audio Sources in a Multichannel Underdetermined Mixture [J].
Arberet, Simon ;
Gribonval, Remi ;
Bimbot, Frederic .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2010, 58 (01) :121-133
[8]  
Bella Mostafa, 2020, Image and Signal Processing. 9th International Conference, ICISP 2020. Proceedings. Lecture Notes in Computer Science (LNCS 12119), P357, DOI 10.1007/978-3-030-51935-3_38
[9]  
Bella M., 2022, P INT WORKSH MULT SI, P741
[10]  
Bella M, 2022, EUR SIGNAL PR CONF, P1981