Optimal spectral transportation with application to music transcription

被引:0
作者
Flamary, Remi [1 ]
Fevotte, Cedric [2 ]
Courty, Nicolas [3 ]
Emiya, Valentin [4 ]
机构
[1] Univ Cote dAzur, CNRS, OCA, Nice, France
[2] IRIT, CNRS, Toulouse, France
[3] Univ Bretagne Sud, CNRS, IRISA, Lorient, France
[4] Aix Marseille Univ, CNRS, LIF, Marseille, France
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016) | 2016年 / 29卷
基金
欧洲研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many spectral unmixing methods rely on the non-negative decomposition of spectral data onto a dictionary of spectral templates. In particular, state-of-the-art music transcription systems decompose the spectrogram of the input signal onto a dictionary of representative note spectra. The typical measures of fit used to quantify the adequacy of the decomposition compare the data and template entries frequency-wise. As such, small displacements of energy from a frequency bin to another as well as variations of timbre can disproportionally harm the fit. We address these issues by means of optimal transportation and propose a new measure of fit that treats the frequency distributions of energy holistically as opposed to frequency-wise. Building on the harmonic nature of sound, the new measure is invariant to shifts of energy to harmonically-related frequencies, as well as to small and local displacements of energy. Equipped with this new measure of fit, the dictionary of note templates can be considerably simplified to a set of Dirac vectors located at the target fundamental frequencies (musical pitch values). This in turns gives ground to a very fast and simple decomposition algorithm that achieves state-of-the-art performance on real musical data.
引用
收藏
页数:9
相关论文
共 19 条
[1]  
[Anonymous], P INT C COMP VIS ICC
[2]  
[Anonymous], P INT C ART INT STAT
[3]  
[Anonymous], P IEEE WORKSH APPL S
[4]   ITERATIVE BREGMAN PROJECTIONS FOR REGULARIZED TRANSPORTATION PROBLEMS [J].
Benamou, Jean-David ;
Carlier, Guillaume ;
Cuturi, Marco ;
Nenna, Luca ;
Peyre, Gabriel .
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2015, 37 (02) :A1111-A1138
[5]  
Boulanger-Lewandowski N., 2012, P INT SOC MUS INF RE
[6]  
Courty N., 2014, P EUR C MACH LEARN P
[7]  
Cuturi M., 2013, NIPS
[8]  
Daniel A., 2008, P INT SOC MUS INF RE
[9]   Multipitch Estimation of Piano Sounds Using a New Probabilistic Spectral Smoothness Principle [J].
Emiya, Valentin ;
Badeau, Roland ;
David, Bertrand .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06) :1643-1654
[10]   Unsupervised learning by probabilistic latent semantic analysis [J].
Hofmann, T .
MACHINE LEARNING, 2001, 42 (1-2) :177-196