Enforcing Harmonicity and Smoothness in Bayesian Non-Negative Matrix Factorization Applied to Polyphonic Music Transcription

被引:117
作者
Bertin, Nancy [1 ]
Badeau, Roland [1 ]
Vincent, Emmanuel [2 ]
机构
[1] TELECOM ParisTech, Inst TELECOM, LTCI CNRS, Dept Traitement Signal & Images, F-75634 Paris, France
[2] INRIA, Ctr Inria Rennes Bretagne Atlantique, F-35042 Rennes, France
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2010年 / 18卷 / 03期
关键词
Audio source separation; Bayesian regression; music transcription; non-negative matrix factorization (NMF); unsupervised machine learning;
D O I
10.1109/TASL.2010.2041381
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents theoretical and experimental results about constrained non-negative matrix factorization (NMF) in a Bayesian framework. A model of superimposed Gaussian components including harmonicity is proposed, while temporal continuity is enforced through an inverse-Gamma Markov chain prior. We then exhibit a space-alternating generalized expectation-maximization (SAGE) algorithm to estimate the parameters. Computational time is reduced by initializing the system with an original variant of multiplicative harmonic NMF, which is described as well. The algorithm is then applied to perform polyphonic piano music transcription. It is compared to other state-of-the-art algorithms, especially NMF-based. Convergence issues are also discussed on a theoretical and experimental point of view. Bayesian NMF with harmonicity and temporal continuity constraints is shown to outperform other standard NMF-based transcription systems, providing a meaningful mid-level representation of the data. However, temporal smoothness has its drawbacks, as far as transients are concerned in particular, and can be detrimental to transcription performance when it is the only constraint used. Possible improvements of the temporal prior are discussed.
引用
收藏
页码:538 / 549
页数:12
相关论文
共 28 条
  • [1] Abdallah S.A., 2004, P INT C MUS INF RETR, P318
  • [2] [Anonymous], INT CONF ACOUST SPEE
  • [3] Benaroya L, 2003, 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL VI, PROCEEDINGS, P613
  • [4] BENAROYA L, 2006, P INT WORKSH AC ECH
  • [5] Bertin N, 2007, INT CONF ACOUST SPEE, P65
  • [6] A generative model for music transcription
    Cemgil, AT
    Kappen, HJ
    Barber, D
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (02): : 679 - 694
  • [7] CONT A, 2006, INT CONF ACOUST SPEE, P245
  • [8] EMIYA V, 2008, THESIS I TELECOM PAR
  • [9] EMIYA V, 2008, P EUR C SIG PROC EUS
  • [10] SPACE-ALTERNATING GENERALIZED EXPECTATION-MAXIMIZATION ALGORITHM
    FESSLER, JA
    HERO, AO
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1994, 42 (10) : 2664 - 2677