Multichannel Extensions of Non-Negative Matrix Factorization With Complex-Valued Data

被引：222

作者：

Sawada, Hiroshi ^{[1
]}

Kameoka, Hirokazu ^{[1
]}

Araki, Shoko ^{[1
]}

Ueda, Naonori ^{[1
]}

机构：

[1] NTT Corp, NTT Commun Sci Labs, Kyoto 6190237, Japan

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2013年 / 21卷 / 05期

关键词：

Blind source separation; clustering; convolutive mixture; multichannel; non-negative matrix factorization; SOURCE SEPARATION; AUDIO; ALGORITHMS; BASES;

D O I：

10.1109/TASL.2013.2239990

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents new formulations and algorithms for multichannel extensions of non-negative matrix factorization (NMF). The formulations employ Hermitian positive semidefinite matrices to represent a multichannel version of non-negative elements. Multichannel Euclidean distance and multichannel Itakura-Saito (IS) divergence are defined based on appropriate statistical models utilizing multivariate complex Gaussian distributions. To minimize this distance/divergence, efficient optimization algorithms in the form of multiplicative updates are derived by using properly designed auxiliary functions. Two methods are proposed for clustering NMF bases according to the estimated spatial property. Convolutive blind source separation (BSS) is performed by the multichannel extensions of NMF with the clustering mechanism. Experimental results show that 1) the derived multiplicative update rules exhibited good convergence behavior, and 2) BSS tasks for several music sources with two microphones and three instrumental parts were evaluated successfully.

引用

页码：971 / 982

页数：12

共 29 条

[1]

[Anonymous], 2009, COMPUT INTELL NEUROS

[2]

[Anonymous], 2003, P 26 ANN INT ACM SIG, DOI DOI 10.1145/860435.860485

[3]

Araki Shoko, 2012, Latent Variable Analysis and Signal Separation. Proceedings 10th International Conference, LVA/ICA 2012, P414, DOI 10.1007/978-3-642-28551-6_51

[4]

Arberet S., 2010, 2010 10th International Conference on Information Sciences, Signal Processing and their Applications (ISSPA 2010), P1, DOI 10.1109/ISSPA.2010.5605570

[5]

Blauert J., 1997, Spatial hearing: the psychophysics of human sound localization

[6]

de Leeuw J., 1994, INFORM SYSTEMS DATA, P308, DOI DOI 10.1007/978-3-642-46808-7_28

[7] Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model [J].

Duong, Ngoc Q. K. ;

Vincent, Emmanuel ;

Gribonval, Remi .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07) :1830-1840

[8] Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis [J].

Fevotte, Cedric ;

Bertin, Nancy ;

Durrieu, Jean-Louis .

NEURAL COMPUTATION, 2009, 21 (03) :793-830

[9]

FitzGerald D., 2005, IEE Irish Signals and Systems Conference 2005, P8, DOI 10.1049/cp:20050279

[10]

Fitzgerald Derry, 2005, P IEEESP 13 WORKSHOP, P1132, DOI DOI 10.1109/SSP.2005.1628765

← 1 2 3 →