Hybrid Projective Nonnegative Matrix Factorization With Drum Dictionaries for Harmonic/Percussive Source Separation

被引:3
作者
Laroche, Clement [1 ,2 ]
Kowalski, Matthieu [2 ]
Papadopoulos, Helene [2 ]
Richard, Gael [1 ]
机构
[1] Univ Paris Saclay, Telecom ParisTech, LTCI, F-75013 Paris, France
[2] Univ Paris Sud, Cent Supelec, CNRS, UMR 8506,Lab Signaux & Syst, F-91192 Gif Sur Yvette, France
关键词
Nonnegative matrix factorization; projective nonnegative matrix factorization; audio source separation; harmonic/percussive decomposition; POLYPHONIC MUSIC; MELODY EXTRACTION; SPEECH SIGNALS; TRANSCRIPTION; DECOMPOSITION; ALGORITHMS;
D O I
10.1109/TASLP.2018.2830116
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
One of the most general models of music signals considers that such signals can be represented as a sum of two distinct components: a tonal part that is sparse in frequency and temporally stable and a transient (or percussive) part that is composed of short-term broadband sounds. In this paper, we propose a novel hybrid method built upon nonnegative matrix factorization (NMF) that decomposes the time frequency representation of an audio signal into such two components. The tonal part is estimated by a sparse and orthogonal nonnegative decomposition, and the transient part is estimated by a straightforward NMF decomposition constrained by a pre-learned dictionary of smooth spectra. The optimization problem at the heart of our method remains simple with very few hyperparameters and can be solved thanks to simple multiplicative update rules. The extensive benchmark on a large and varied music database against four state of the art harmonic/percussive source separation algorithms demonstrate the merit of the proposed approach.
引用
收藏
页码:1499 / 1511
页数:13
相关论文
共 50 条
  • [1] Accurate tempo estimation based on harmonic plus noise decomposition
    Alonso, Miguel
    Richard, Gael
    David, Bertrand
    [J]. EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2007,
  • [2] [Anonymous], 2010, P 13 INT C DIG AUD E
  • [3] [Anonymous], 2009, NONNEGATIVE MATRIX T
  • [4] [Anonymous], 2010, P INT C DIG AUD EFF
  • [5] The 2010 Signal Separation Evaluation Campaign (SiSEC2010): Audio Source Separation
    Araki, Shoko
    Ozerov, Alexey
    Gowreesunker, Vikrham
    Sawada, Hiroshi
    Theis, Fabian
    Nolte, Guido
    Lutter, Dominik
    Duong, Ngoc Q. K.
    [J]. LATENT VARIABLE ANALYSIS AND SIGNAL SEPARATION, 2010, 6365 : 114 - 122
  • [6] Bertin N, 2007, INT CONF ACOUST SPEE, P65
  • [7] Bittner R. M., 2014, P ISMIR, P155
  • [8] Bregman A., 1990, Auditory Scene Analysis: The Perceptual Organization of Sound, DOI DOI 10.7551/MITPRESS/1486.001.0001
  • [9] Algorithms for Orthogonal Nonnegative Matrix Factorization
    Choi, Seungjin
    [J]. 2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 1828 - 1832
  • [10] LOCALLY WEIGHTED REGRESSION - AN APPROACH TO REGRESSION-ANALYSIS BY LOCAL FITTING
    CLEVELAND, WS
    DEVLIN, SJ
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1988, 83 (403) : 596 - 610