Harmonic/Percussive Sound Separation Based on Anisotropic Smoothness of Spectrograms

被引:15
|
作者
Tachibana, Hideyuki [1 ]
Ono, Nobutaka [2 ,3 ]
Kameoka, Hirokazu [1 ,4 ]
Sagayama, Shigeki [2 ]
机构
[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Tokyo 1138656, Japan
[2] Natl Inst Informat, Tokyo 1010003, Japan
[3] Grad Univ Adv Studies SOKENDAI, Tokyo 1018430, Japan
[4] NTT Commun Sci Lab, Atsugi, Kanagawa 2430198, Japan
基金
日本学术振兴会;
关键词
Audio source separation; harmonic instruments; music signal processing; percussion; NONNEGATIVE MATRIX FACTORIZATION; MUSIC; PATTERNS; SIGNALS;
D O I
10.1109/TASLP.2014.2351131
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper describes a method to separate a monaural music signal into harmonic components e. g., a guitar and percussive components, e. g., a snare drum. Separation of these two components is a useful preprocessing for many music information retrieval applications, and in addition, it can be used as a new kind of music equalizer in itself, which enables a music listener to adjust the ratio of the volume of the guitar and the drum freely by themselves. Because of these potential applications, there have been many attempts to develop such a technique, especially in the last decade. However, some of the state-of-the-art techniques have a drawback that they are based on costly operations, such as the multiplications of large-sized matrix, Monte Carlo method, etc., which may constitute barriers to the practical use on some small computers such as smart phones. In this paper, an efficient method that does not depend on these costly operations is described. In formulating the methods, the authors basically assumed only the "anisotropic smoothness" of music spectrogram, which can be one of the minimalistic model that reflects the natures of these instruments. To be specific, the authors just assumed that harmonic instruments are smooth in time, while the percussive instruments are smooth in frequency on a music spectrogram. In this paper, on the basis of the assumption, source separation methods are formulated as optimization problems that optimize the "anisotropic smoothness" under some conditions. Because of the simplicity of the model, the derived algorithms are quite simple. Experimental results show that the methods were effective compared to a state-of-the-art technique, and the computation time was much shorter than an existing method; specifically, it can process a three-minute song in around 4-20 seconds on a laptop PC.
引用
收藏
页码:2059 / 2073
页数:15
相关论文
共 31 条
  • [21] Multichannel Sound Source Dereverberation and Separation for Arbitrary Number of Sources Based on Bayesian Nonparametrics
    Otsuka, Takuma
    Ishiguro, Katsuhiko
    Yoshioka, Takuya
    Sawada, Hiroshi
    Okuno, Hiroshi G.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (12) : 2218 - 2232
  • [22] Research on heart and lung sound separation method based on DAE-NMF-VMD
    Sun, Wenhui
    Zhang, Yipeng
    Chen, Fuming
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2024, 2024 (01)
  • [23] DOA Estimation of Indoor Sound Sources Based on Spherical Harmonic Domain Beam-Space MUSIC
    Weng, Liuqing
    Song, Xiyu
    Liu, Zhenghong
    Liu, Xiaojuan
    Zhou, Haocheng
    Qiu, Hongbing
    Wang, Mei
    SYMMETRY-BASEL, 2023, 15 (01):
  • [24] Separation of Vibration-Derived Sound Signals Based on Fusion Processing of Vibration Sensors and Microphones
    Takashima, Ryoichi
    Kawaguchi, Yohei
    Togami, Masahito
    2017 25TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2017, : 2428 - 2432
  • [25] Clustering Algorithm for Unsupervised Monaural Musical Sound Separation Based on Non-negative Matrix Factorization
    Park, Sang Ha
    Lee, Seokjin
    Sung, Koeng-Mo
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2012, E95A (04) : 818 - 823
  • [26] SOUND SOURCE SEPARATION BASED ON NON-NEGATIVE TENSOR FACTORIZATION INCORPORATING SPATIAL CUE AS PRIOR KNOWLEDGE
    Mitsufuji, Yuki
    Roebel, Axel
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 71 - 75
  • [27] A multi-channel UNet framework based on SNMF-DCNN for robust heart-lung-sound separation
    Wang, Weibo
    Qin, Dimei
    Wang, Shubo
    Fang, Yu
    Zheng, Yongkang
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 164
  • [28] AN ADAPTIVE TIME-FREQUENCY RESOLUTION APPROACH FOR NON-NEGATIVE MATRIX FACTORIZATION BASED SINGLE CHANNEL SOUND SOURCE SEPARATION
    Kirbiz, Serap
    Smaragdis, Paris
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 253 - 256
  • [29] On the use of a spatial cue as prior information for stereo sound source separation based on spatially weighted non-negative tensor factorization
    Mitsufuji, Yuki
    Roebel, Axel
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2014,
  • [30] Multichannel Blind Source Separation Based on Evanescent-Region-Aware Non-Negative Tensor Factorization in Spherical Harmonic Domain
    Mitsufuji, Yuki
    Takamune, Norihiro
    Koyama, Shoichi
    Saruwatari, Hiroshi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 607 - 617