Speech/Music Separation Using Non-negative Matrix Factorization with Combination of Cost Functions

被引：0

作者：

Nasersharif, Babak ^{[1
]}

Abdali, Sara ^{[1
]}

机构：

[1] KN Toosi Univ Technol, Fac Comp Engn, Tehran, Iran

来源：

2015 INTERNATIONAL SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND SIGNAL PROCESSING (AISP) | 2015年

关键词：

Non-negative Matrix Factorization (NMF); Itakura-Saito divergence; Kullback-Leibler divergence; Single Channel Source Separation; Speech; Music;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A solution for separating speech from music signal as a single channel source separation is Non-negative Matrix Factorization (NMF). In this approach spectrogram of each source signal is factorized as multiplication of two matrices which are known as basis and weight matrices. To achieve proper estimation of signal spectrogram, weight and basis matrices are updated iteratively. To estimate distance between signal and its estimation a cost function is used usually. Different cost functions have been introduced based on Kullback-Leibler (KL) and Itakura-Saito (IS) divergences. IS divergence is scale-invariant and so it is suitable for the conditions in which the coefficients of signal have a large dynamic range, for example in music short-term spectra. Based on this IS property, in this paper, we propose to use IS divergence as cost function of NMF in the training stage for music and on the other hand we suggest to use KL divergence as NMF cost function in the training stage for speech. Moreover, in the decomposition stage, we propose to use a linear combination of these two divergences in addition to a regularization term which considers temporal continuity information as a prior knowledge. Experimental results on one hour of speech and music, shows a good trade-off between signal to inference ratio (SIR) of speech and music in comparison to conventional NMF methods.

引用

页码：107 / 111

页数：5

共 9 条

[1] Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis [J].

Fevotte, Cedric ;

Bertin, Nancy ;

Durrieu, Jean-Louis .

NEURAL COMPUTATION, 2009, 21 (03) :793-830

[2]

Grais Emad M., 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P3734, DOI 10.1109/ICASSP.2014.6854299

[3]

Grais E. M., 2011, 2011 17th International Conference on Digital Signal Processing, P1

[4] Source separation using regularized NMF with MMSE estimates under GMM priors with online learning for the uncertainties [J].

Grais, Emad M. ;

Erdogan, Hakan .

DIGITAL SIGNAL PROCESSING, 2014, 29 :20-34

[5] Regularized nonnegative matrix factorization using Gaussian mixture priors for supervised single channel source separation [J].

Grais, Emad M. ;

Erdogan, Hakan .

COMPUTER SPEECH AND LANGUAGE, 2013, 27 (03) :746-762

[6]

Lee DD, 2001, ADV NEUR IN, V13, P556

[7]

Ozerov A, 2011, INT CONF ACOUST SPEE, P257

[8] Performance measurement in blind audio source separation [J].

Vincent, Emmanuel ;

Gribonval, Remi ;

Févotte, Cedric .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (04) :1462-1469

[9] Monaural sound source separation by nonnegative matrix factorization with tempora continuity and sparseness criteria [J].

Virtanen, Tuomas .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (03) :1066-1074

← 1 →