Monaural sound source separation by nonnegative matrix factorization with tempora continuity and sparseness criteria

被引：693

作者：

Virtanen, Tuomas ^{[1
]}

机构：

[1] Tampere Univ Technol, FI-33101 Tampere, Finland

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2007年 / 15卷 / 03期

基金：

芬兰科学院;

关键词：

acoustic signal analysis; audio source separation; blind source separation; music; nonnegative matrix factorization; sparse coding; unsupervised learning;

D O I：

10.1109/TASL.2006.885253

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

An unsupervised learning algorithm for the separation of sound sources in one-channel music signals is presented. The algorithm is based on factorizing the magnitude spectrogram of an input signal into a sum of components, each of which has a fixed magnitude spectrum and a time-varying gain. Each sound source, in turn, is modeled as a sum of one or more components. The parameters of the components are estimated by minimizing the reconstruction error between the input spectrogram and the model, while restricting the component spectrograms to be nonnegative and favoring components whose gains are slowly varying and sparse. Temporal continuity is favored by using a cost term which is the sum of squared differences between the gains in adjacent frames, and sparseness is favored by penalizing nonzero gains. The proposed iterative estimation algorithm is initialized with random values, and the gains and the spectra are then alternatively updated using multiplicative update rules until the values converge. Simulation experiments were carried out using generated mixtures of pitched musical instrument samples and drum sounds. The performance of the proposed method was compared with independent subspace analysis and basic nonnegative matrix factorization, which are based on the same linear model. According to these simulations, the proposed method enables a better separation quality than the previous algorithms. Especially, the temporal continuity criterion improved the detection of pitched musical sounds. The sparseness criterion did not produce significant improvements.

引用

页码：1066 / 1074

页数：9

共 50 条

[1] SPARSENESS-BASED MULTICHANNEL NONNEGATIVE MATRIX FACTORIZATION FOR BLIND SOURCE SEPARATION
Higuchi, Takuya
Yoshioka, Takuya
Nakatani, Tomohiro
2016 IEEE INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2016,
[2] NMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION
Becker, Julian M.
Sohn, Christian
Rohlfing, Christian
2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 316 - 320
[3] Sequential Initialization of Multichannel Nonnegative Matrix Factorization for Sound Source Separation
Uramoto, Takanobu
Tachioka, Yuuki
Narita, Tomohiro
Miura, Iori
Uenohara, Shingo
Furuya, Ken'ichi
2017 IEEE 6TH GLOBAL CONFERENCE ON CONSUMER ELECTRONICS (GCCE), 2017,
[4] A STRUCTURED NONNEGATIVE MATRIX FACTORIZATION FOR SOURCE SEPARATION
Laroche, Clement
Kowalski, Matthieu
Papadopoulos, Helene
Richard, Gael
2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 2033 - 2037
[5] Supervised and Constrained Nonnegative Matrix Factorization with Sparseness for Image Representation
Xibiao Cai
Fuming Sun
Wireless Personal Communications, 2018, 102 : 3055 - 3066
[6] Incremental Nonnegative Matrix Factorization with Sparseness Constraint for Image Representation
Sun, Jing
Wang, Zhihui
Li, Haojie
Sun, Fuming
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2018, PT II, 2018, 11165 : 351 - 360
[7] Supervised and Constrained Nonnegative Matrix Factorization with Sparseness for Image Representation
Cai, Xibiao
Sun, Fuming
WIRELESS PERSONAL COMMUNICATIONS, 2018, 102 (04) : 3055 - 3066
[8] Monaural Singing Voice Separation by Non-negative Matrix Partial Co-Factorization with Temporal Continuity and Sparsity Criteria
Hu, Ying
Wang, Liejun
Huang, Hao
Zhou, Gang
INTELLIGENT COMPUTING METHODOLOGIES, ICIC 2016, PT III, 2016, 9773 : 33 - 43
[9] Bayesian Factorization and Learning for Monaural Source Separation
Chien, Jen-Tzung
Yang, Po-Kai
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (01) : 185 - 195
[10] Initialization of Nonnegative Matrix Factorization Dictionaries for Single Channel Source Separation
Grais, Emad M.
Erdogan, Hakan
2013 21ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2013,

← 1 2 3 4 5 →