Nonnegative Matrix Factorization With Basis Clustering Using Cepstral Distance Regularization

被引:5
作者
Kameoka, Hirokazu [1 ]
Higuchi, Takuya [1 ]
Tanaka, Mikihiro [2 ]
Li, Li [3 ]
机构
[1] NTT Corp, NTT Commun Sci Labs, Tokyo 2430198, Japan
[2] Univ Tokyo, Tokyo 1138656, Japan
[3] Univ Tsukuba, Tsukuba, Ibaraki 3058577, Japan
关键词
Audio source separation; nonnegative matrix factorization (NMF); basis clustering; mel-frequency cepstral coefficient (MFCC); majorization-minimization algorithm; MODEL; NMF;
D O I
10.1109/TASLP.2018.2795746
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
One successful approach for audio source separation involves applying nonnegative matrix factorization (NMF) to a magnitude spectrogram regarded as a nonnegative matrix. This can be interpreted as approximating the observed spectra at each time frame as the linear sum of the basis spectra scaled by time-varying amplitudes. This paper deals with the problem of the unsupervised instrument-wise source separation of polyphonic signals based on an extension of the NMF approach. We focus on the fact that each piece of music is typically played on a handful of musical instruments, which allows us to assume that the spectra of the underlying audio events in a polyphonic signal can be grouped into a reasonably small number of clusters in the mel-frequency cepstral coefficient (MFCC) domain. Based on this assumption, we propose formulating factorization of amagnitude spectrogram and clustering of the basis spectra in the MFCC domain as a joint optimization problem and derive a novel optimization algorithm based on the majorization-minimization principle. Experimental results revealed that our method was superior to a two-stage algorithm that consists of performing factorization followed by clustering the basis spectra, thus showing the advantage of the joint optimization approach.
引用
收藏
页码:1025 / 1036
页数:12
相关论文
共 36 条
  • [1] [Anonymous], 2007, INFORM RETRIEVAL MUS
  • [2] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
    DEMPSTER, AP
    LAIRD, NM
    RUBIN, DB
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
  • [3] Source/Filter Model for Unsupervised Main Melody Extraction From Polyphonic Audio Signals
    Durrieu, Jean-Louis
    Richard, Gael
    David, Bertrand
    Fevotte, Cedric
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (03): : 564 - 575
  • [4] Algorithms for Nonnegative Matrix Factorization with the β-Divergence
    Fevotte, Cedric
    Idier, Jerome
    [J]. NEURAL COMPUTATION, 2011, 23 (09) : 2421 - 2456
  • [5] Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis
    Fevotte, Cedric
    Bertin, Nancy
    Durrieu, Jean-Louis
    [J]. NEURAL COMPUTATION, 2009, 21 (03) : 793 - 830
  • [6] Goto M., 2003, P 4 INT C MUS INF RE, P229
  • [7] Hayashi A., 2016, P AS PAC SIGN INF PR
  • [8] NMF With Time-Frequency Activations to Model Nonstationary Audio Events
    Hennequin, Romain
    Badeau, Roland
    David, Bertrand
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 744 - 753
  • [9] Hoyer PO, 2004, J MACH LEARN RES, V5, P1457
  • [10] A tutorial on MM algorithms
    Hunter, DR
    Lange, K
    [J]. AMERICAN STATISTICIAN, 2004, 58 (01) : 30 - 37