Monaural sound source separation by nonnegative matrix factorization with tempora continuity and sparseness criteria

被引:693
|
作者
Virtanen, Tuomas [1 ]
机构
[1] Tampere Univ Technol, FI-33101 Tampere, Finland
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2007年 / 15卷 / 03期
基金
芬兰科学院;
关键词
acoustic signal analysis; audio source separation; blind source separation; music; nonnegative matrix factorization; sparse coding; unsupervised learning;
D O I
10.1109/TASL.2006.885253
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
An unsupervised learning algorithm for the separation of sound sources in one-channel music signals is presented. The algorithm is based on factorizing the magnitude spectrogram of an input signal into a sum of components, each of which has a fixed magnitude spectrum and a time-varying gain. Each sound source, in turn, is modeled as a sum of one or more components. The parameters of the components are estimated by minimizing the reconstruction error between the input spectrogram and the model, while restricting the component spectrograms to be nonnegative and favoring components whose gains are slowly varying and sparse. Temporal continuity is favored by using a cost term which is the sum of squared differences between the gains in adjacent frames, and sparseness is favored by penalizing nonzero gains. The proposed iterative estimation algorithm is initialized with random values, and the gains and the spectra are then alternatively updated using multiplicative update rules until the values converge. Simulation experiments were carried out using generated mixtures of pitched musical instrument samples and drum sounds. The performance of the proposed method was compared with independent subspace analysis and basic nonnegative matrix factorization, which are based on the same linear model. According to these simulations, the proposed method enables a better separation quality than the previous algorithms. Especially, the temporal continuity criterion improved the detection of pitched musical sounds. The sparseness criterion did not produce significant improvements.
引用
收藏
页码:1066 / 1074
页数:9
相关论文
共 50 条
  • [1] SPARSENESS-BASED MULTICHANNEL NONNEGATIVE MATRIX FACTORIZATION FOR BLIND SOURCE SEPARATION
    Higuchi, Takuya
    Yoshioka, Takuya
    Nakatani, Tomohiro
    2016 IEEE INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2016,
  • [2] NMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION
    Becker, Julian M.
    Sohn, Christian
    Rohlfing, Christian
    2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 316 - 320
  • [3] Sequential Initialization of Multichannel Nonnegative Matrix Factorization for Sound Source Separation
    Uramoto, Takanobu
    Tachioka, Yuuki
    Narita, Tomohiro
    Miura, Iori
    Uenohara, Shingo
    Furuya, Ken'ichi
    2017 IEEE 6TH GLOBAL CONFERENCE ON CONSUMER ELECTRONICS (GCCE), 2017,
  • [4] A STRUCTURED NONNEGATIVE MATRIX FACTORIZATION FOR SOURCE SEPARATION
    Laroche, Clement
    Kowalski, Matthieu
    Papadopoulos, Helene
    Richard, Gael
    2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 2033 - 2037
  • [5] Supervised and Constrained Nonnegative Matrix Factorization with Sparseness for Image Representation
    Xibiao Cai
    Fuming Sun
    Wireless Personal Communications, 2018, 102 : 3055 - 3066
  • [6] Incremental Nonnegative Matrix Factorization with Sparseness Constraint for Image Representation
    Sun, Jing
    Wang, Zhihui
    Li, Haojie
    Sun, Fuming
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2018, PT II, 2018, 11165 : 351 - 360
  • [7] Supervised and Constrained Nonnegative Matrix Factorization with Sparseness for Image Representation
    Cai, Xibiao
    Sun, Fuming
    WIRELESS PERSONAL COMMUNICATIONS, 2018, 102 (04) : 3055 - 3066
  • [8] Monaural Singing Voice Separation by Non-negative Matrix Partial Co-Factorization with Temporal Continuity and Sparsity Criteria
    Hu, Ying
    Wang, Liejun
    Huang, Hao
    Zhou, Gang
    INTELLIGENT COMPUTING METHODOLOGIES, ICIC 2016, PT III, 2016, 9773 : 33 - 43
  • [9] Bayesian Factorization and Learning for Monaural Source Separation
    Chien, Jen-Tzung
    Yang, Po-Kai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (01) : 185 - 195
  • [10] Initialization of Nonnegative Matrix Factorization Dictionaries for Single Channel Source Separation
    Grais, Emad M.
    Erdogan, Hakan
    2013 21ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2013,