Multichannel Extensions of Non-Negative Matrix Factorization With Complex-Valued Data

被引:222
作者
Sawada, Hiroshi [1 ]
Kameoka, Hirokazu [1 ]
Araki, Shoko [1 ]
Ueda, Naonori [1 ]
机构
[1] NTT Corp, NTT Commun Sci Labs, Kyoto 6190237, Japan
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2013年 / 21卷 / 05期
关键词
Blind source separation; clustering; convolutive mixture; multichannel; non-negative matrix factorization; SOURCE SEPARATION; AUDIO; ALGORITHMS; BASES;
D O I
10.1109/TASL.2013.2239990
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents new formulations and algorithms for multichannel extensions of non-negative matrix factorization (NMF). The formulations employ Hermitian positive semidefinite matrices to represent a multichannel version of non-negative elements. Multichannel Euclidean distance and multichannel Itakura-Saito (IS) divergence are defined based on appropriate statistical models utilizing multivariate complex Gaussian distributions. To minimize this distance/divergence, efficient optimization algorithms in the form of multiplicative updates are derived by using properly designed auxiliary functions. Two methods are proposed for clustering NMF bases according to the estimated spatial property. Convolutive blind source separation (BSS) is performed by the multichannel extensions of NMF with the clustering mechanism. Experimental results show that 1) the derived multiplicative update rules exhibited good convergence behavior, and 2) BSS tasks for several music sources with two microphones and three instrumental parts were evaluated successfully.
引用
收藏
页码:971 / 982
页数:12
相关论文
共 29 条
[1]  
[Anonymous], 2009, COMPUT INTELL NEUROS
[2]  
[Anonymous], 2003, P 26 ANN INT ACM SIG, DOI DOI 10.1145/860435.860485
[3]  
Araki Shoko, 2012, Latent Variable Analysis and Signal Separation. Proceedings 10th International Conference, LVA/ICA 2012, P414, DOI 10.1007/978-3-642-28551-6_51
[4]  
Arberet S., 2010, 2010 10th International Conference on Information Sciences, Signal Processing and their Applications (ISSPA 2010), P1, DOI 10.1109/ISSPA.2010.5605570
[5]  
Blauert J., 1997, Spatial hearing: the psychophysics of human sound localization
[6]  
de Leeuw J., 1994, INFORM SYSTEMS DATA, P308, DOI DOI 10.1007/978-3-642-46808-7_28
[7]   Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model [J].
Duong, Ngoc Q. K. ;
Vincent, Emmanuel ;
Gribonval, Remi .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07) :1830-1840
[8]   Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis [J].
Fevotte, Cedric ;
Bertin, Nancy ;
Durrieu, Jean-Louis .
NEURAL COMPUTATION, 2009, 21 (03) :793-830
[9]  
FitzGerald D., 2005, IEE Irish Signals and Systems Conference 2005, P8, DOI 10.1049/cp:20050279
[10]  
Fitzgerald Derry, 2005, P IEEESP 13 WORKSHOP, P1132, DOI DOI 10.1109/SSP.2005.1628765