Hybrid Projective Nonnegative Matrix Factorization With Drum Dictionaries for Harmonic/Percussive Source Separation

被引：3

作者：

Laroche, Clement ^{[1
,2
]}

Kowalski, Matthieu ^{[2
]}

Papadopoulos, Helene ^{[2
]}

Richard, Gael ^{[1
]}

机构：

[1] Univ Paris Saclay, Telecom ParisTech, LTCI, F-75013 Paris, France

[2] Univ Paris Sud, Cent Supelec, CNRS, UMR 8506,Lab Signaux & Syst, F-91192 Gif Sur Yvette, France

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2018年 / 26卷 / 09期

关键词：

Nonnegative matrix factorization; projective nonnegative matrix factorization; audio source separation; harmonic/percussive decomposition; POLYPHONIC MUSIC; MELODY EXTRACTION; SPEECH SIGNALS; TRANSCRIPTION; DECOMPOSITION; ALGORITHMS;

D O I：

10.1109/TASLP.2018.2830116

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

One of the most general models of music signals considers that such signals can be represented as a sum of two distinct components: a tonal part that is sparse in frequency and temporally stable and a transient (or percussive) part that is composed of short-term broadband sounds. In this paper, we propose a novel hybrid method built upon nonnegative matrix factorization (NMF) that decomposes the time frequency representation of an audio signal into such two components. The tonal part is estimated by a sparse and orthogonal nonnegative decomposition, and the transient part is estimated by a straightforward NMF decomposition constrained by a pre-learned dictionary of smooth spectra. The optimization problem at the heart of our method remains simple with very few hyperparameters and can be solved thanks to simple multiplicative update rules. The extensive benchmark on a large and varied music database against four state of the art harmonic/percussive source separation algorithms demonstrate the merit of the proposed approach.

引用

页码：1499 / 1511

页数：13

共 50 条

[31] Learning the parts of objects by non-negative matrix factorization
Lee, DD
Seung, HS
[J]. NATURE, 1999, 401 (6755) : 788 - 791
[32] Lee DD, 2001, ADV NEUR IN, V13, P556
[33] Liutkus A, 2015, INT CONF ACOUST SPEE, P266, DOI 10.1109/ICASSP.2015.7177973
[34] Kernel Additive Models for Source Separation
Liutkus, Antoine
Fitzgerald, Derry
Rafii, Zafar
Pardo, Bryan
Daudet, Laurent
[J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2014, 62 (16) : 4298 - 4310
[35] A General Flexible Framework for the Handling of Prior Information in Audio Source Separation
Ozerov, Alexey
Vincent, Emmanuel
Bimbot, Frederic
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (04): : 1118 - 1133
[36] Exploiting Continuity/Discontinuity of Basis Vectors in Spectrogram Decomposition for Harmonic-Percussive Sound Separation
Park, Jeongsoo
Shin, Jaeyoung
Lee, Kyogu
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (05) : 1061 - 1074
[37] Paulus Jouni, 2005, 2005 13th European Signal Processing Conference, P1, DOI 10.1109/PTC.2005.4524799
[38] Raczynski S.A., 2007, P INT C MUS INF RETR, P381
[39] Rigaud F., 2016, Proceedings of the 17th International Society for Music Information Retrieval Conference, P737, DOI [10.5281/zenodo.1418051, DOI 10.5281/ZENODO.1418051]
[40] Rigaud F, 2011, INT CONF ACOUST SPEE, P381

← 1 2 3 4 5 →