Spatial location priors for Gaussian model based reverberant audio source separation

被引:0
作者
Ngoc Q K Duong
Emmanuel Vincent
Rémi Gribonval
机构
[1] Technicolor Rennes Research & Innovation Center,
[2] Inria,undefined
[3] Inria,undefined
来源
EURASIP Journal on Advances in Signal Processing | / 2013卷
关键词
Audio source separation; Spatial covariance; EM algorithm; Probabilistic priors; Inverse-Wishart; Gaussian;
D O I
暂无
中图分类号
学科分类号
摘要
We consider the Gaussian framework for reverberant audio source separation, where the sources are modeled in the time-frequency domain by their short-term power spectra and their spatial covariance matrices. We propose two alternative probabilistic priors over the spatial covariance matrices which are consistent with the theory of statistical room acoustics and we derive expectation-maximization algorithms for maximum a posteriori (MAP) estimation. We argue that these algorithms provide a statistically principled solution to the permutation problem and to the risk of overfitting resulting from conventional maximum likelihood (ML) estimation. We show experimentally that in a semi-informed scenario where the source positions and certain room characteristics are known, the MAP algorithms outperform their ML counterparts. This opens the way to rigorous statistical treatment of this family of models in other scenarios in the future.
引用
收藏
相关论文
共 62 条
[1]  
O’Grady P(2005)Survey of sparse and non-sparse methods in source separation Int. J. Imaging Syst. Technol 15 18-33
[2]  
Pearlmutter B(1998)Blind separation of convolved mixtures in the frequency domain Neurocomputing 22 21-34
[3]  
Rickard ST(2004)Blind separation of speech mixtures via time-frequency masking IEEE Trans. Signal Process 52 1830-1847
[4]  
Smaragdis P(2011)Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment IEEE Trans. Audio Speech Lang. Process 19 516-527
[5]  
Yilmaz O(2010)Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation IEEE Trans. Audio Speech Lang. Process 18 550-563
[6]  
Rickard ST(2010)Under-determined reverberant audio source separation using a full-rank spatial covariance model IEEE Trans. Audio Speech Lang. Process 18 1830-1840
[7]  
Sawada H(2012)A general flexible framework for the handling of prior information in audio source separation IEEE Trans. Audio Speech Lang. Process 20 1118-1133
[8]  
Araki S(2006)Audio source separation with a single sensor IEEE Trans. Audio Speech Lang. Process 14 191-199
[9]  
Makino S(2009)Nonnegative matrix factorization with the Itakura-Saito divergence. With application to music analysis Neural Comput 21 793-830
[10]  
Ozerov A(2010)Gamma Markov random fields for audio source modeling IEEE Trans. Audio Speech Lang. Process 18 589-601