Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model

被引:348
作者
Duong, Ngoc Q. K. [1 ]
Vincent, Emmanuel [1 ]
Gribonval, Remi [1 ]
机构
[1] INRIA, Ctr Inria Rennes Bretagne Atlantique, F-35042 Rennes, France
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2010年 / 18卷 / 07期
关键词
Convolutive blind source separation (BSS); expectation-maximization (EM) algorithm; permutation problem; spatial covariance models; under-determined mixtures; MAXIMUM-LIKELIHOOD; BLIND SEPARATION; FREQUENCY; MIXTURES;
D O I
10.1109/TASL.2010.2050716
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper addresses the modeling of reverberant recording environments in the context of under-determined convolutive blind source separation. We model the contribution of each source to all mixture channels in the time-frequency domain as a zero-mean Gaussian random variable whose covariance encodes the spatial characteristics of the source. We then consider four specific covariance models, including a full-rank unconstrained model. We derive a family of iterative expectation-maximization ( EM) algorithms to estimate the parameters of each model and propose suitable procedures adapted from the state-of-the-art to initialize the parameters and to align the order of the estimated sources across all frequency bins. Experimental results over reverberant synthetic mixtures and live recordings of speech data show the effectiveness of the proposed approach.
引用
收藏
页码:1830 / 1840
页数:11
相关论文
共 25 条
[1]  
Araki S, 2009, LECT NOTES COMPUT SC, V5441, P742, DOI 10.1007/978-3-642-00599-2_93
[2]  
Arberet S, 2009, LECT NOTES COMPUT SC, V5441, P751, DOI 10.1007/978-3-642-00599-2_94
[3]   Underdetermined blind separation of delayed sound sources in the frequency domain [J].
Bofill, P .
NEUROCOMPUTING, 2003, 55 (3-4) :627-641
[4]  
CARDOSO JF, 2002, P EUSIPCO, V1, P561
[5]  
COBOS M, IEEE T AUDIO S UNPUB
[6]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[7]  
DUONG NQK, 2009, P IEEE WORKSH APPL S, P129
[8]  
DUONG NQK, 2010, P IEEE INT C AC SPEE
[9]  
ELCHAMI Z, 2008, P INT WORKSH AC ECH
[10]   Maximum likelihood approach for blind audio source separation using time-frequency Gaussian source models [J].
Févotte, C ;
Cardoso, JF .
2005 WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2005, :78-81