Indeterminacy Free Frequency-Domain Blind Separation of Reverberant Audio Sources

被引:7
作者
Di Persia, Leandro [1 ]
Milone, Diego [1 ]
Yanagida, Masuzo [2 ]
机构
[1] UNL, CONICET, Fac Ingn & Ciencias Hidricas, RA-3000 Santa Fe, Argentina
[2] Doshisha Univ, Fac Engn, Intelligent Informat Engn & Sci Dept, Kyotanabe 6100321, Japan
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2009年 / 17卷 / 02期
关键词
Blind source separation (BSS); reverberation; independent component analysis (ICA); speech enhancement; SPEECH RECOGNITION; CONVOLUTIVE MIXTURES; TIME; ROBUST; NOISE;
D O I
10.1109/TASL.2008.2009568
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Blind separation of convolutive mixtures is a very complicated task that has applications in many fields of speech and audio processing, such as hearing aids and man-machine interfaces. One of the proposed solutions is the frequency-domain independent component analysis. The main disadvantage of this method is the presence of permutation ambiguities among consecutive frequency bins. Moreover, this problem is worst when reverberation time increases. Presented in this paper is a new frequency-domain method, that uses a simplified mixing model, where the impulse responses from one source to each microphone are expressed as scaled and delayed versions of one of these impulse responses. This assumption, based on the similitude among waveforms of the impulse responses, is valid for a small spacing of the microphones. Under this model, separation is performed without any permutation or amplitude ambiguity among consecutive frequency bins. This new method is aimed mainly to obtain separation, with a small reduction of reverberation. Nevertheless, as the reverberation Is included in the model, the new method is capable of performing separation for a wide range of reverberant conditions, with very high speed. The separation quality is evaluated using a perceptually designed objective measure. Also, an automatic speech recognition system is used to test the advantages of the algorithm in a real application. Very good results are obtained for both, artificial and real mixtures. The results are significantly better than those by other standard blind source separation algorithms.
引用
收藏
页码:299 / 311
页数:13
相关论文
共 41 条
[1]   UNIFIED APPROACH TO SHORT-TIME FOURIER-ANALYSIS AND SYNTHESIS [J].
ALLEN, JB ;
RABINER, LR .
PROCEEDINGS OF THE IEEE, 1977, 65 (11) :1558-1564
[2]  
AMARI S, 1997, P IEEE WORKSH SIGN P, P101
[3]  
[Anonymous], 2001, Rec. ITU-T P.
[4]  
[Anonymous], 1990, Hidden markov models for speech recognition
[5]   The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech [J].
Araki, S ;
Mukai, R ;
Makino, S ;
Nishikawa, T ;
Saruwatari, H .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (02) :109-116
[6]   Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors [J].
Araki, Shoko ;
Sawada, Hiroshi ;
Mukai, Ryo ;
Makino, Shoji .
SIGNAL PROCESSING, 2007, 87 (08) :1833-1847
[7]  
BENESTY J, 2005, SIGNALS COMMUNICATIO
[8]  
Bingham E, 2000, Int J Neural Syst, V10, P1, DOI 10.1142/S0129065700000028
[9]   Underdetermined blind source separation using sparse representations [J].
Bofill, P ;
Zibulevsky, M .
SIGNAL PROCESSING, 2001, 81 (11) :2353-2362
[10]  
Brandstein M., 2001, Microphone Arrays: Signal Processing Techniques and Applications