Harmonic/Percussive Sound Separation Based on Anisotropic Smoothness of Spectrograms

被引：15

作者：

Tachibana, Hideyuki ^{[1
]}

Ono, Nobutaka ^{[2
,3
]}

Kameoka, Hirokazu ^{[1
,4
]}

Sagayama, Shigeki ^{[2
]}

机构：

[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Tokyo 1138656, Japan

[2] Natl Inst Informat, Tokyo 1010003, Japan

[3] Grad Univ Adv Studies SOKENDAI, Tokyo 1018430, Japan

[4] NTT Commun Sci Lab, Atsugi, Kanagawa 2430198, Japan

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2014年 / 22卷 / 12期

基金：

日本学术振兴会;

关键词：

Audio source separation; harmonic instruments; music signal processing; percussion; NONNEGATIVE MATRIX FACTORIZATION; MUSIC; PATTERNS; SIGNALS;

D O I：

10.1109/TASLP.2014.2351131

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper describes a method to separate a monaural music signal into harmonic components e. g., a guitar and percussive components, e. g., a snare drum. Separation of these two components is a useful preprocessing for many music information retrieval applications, and in addition, it can be used as a new kind of music equalizer in itself, which enables a music listener to adjust the ratio of the volume of the guitar and the drum freely by themselves. Because of these potential applications, there have been many attempts to develop such a technique, especially in the last decade. However, some of the state-of-the-art techniques have a drawback that they are based on costly operations, such as the multiplications of large-sized matrix, Monte Carlo method, etc., which may constitute barriers to the practical use on some small computers such as smart phones. In this paper, an efficient method that does not depend on these costly operations is described. In formulating the methods, the authors basically assumed only the "anisotropic smoothness" of music spectrogram, which can be one of the minimalistic model that reflects the natures of these instruments. To be specific, the authors just assumed that harmonic instruments are smooth in time, while the percussive instruments are smooth in frequency on a music spectrogram. In this paper, on the basis of the assumption, source separation methods are formulated as optimization problems that optimize the "anisotropic smoothness" under some conditions. Because of the simplicity of the model, the derived algorithms are quite simple. Experimental results show that the methods were effective compared to a state-of-the-art technique, and the computation time was much shorter than an existing method; specifically, it can process a three-minute song in around 4-20 seconds on a laptop PC.

引用

页码：2059 / 2073

页数：15

共 31 条

[21] Multichannel Sound Source Dereverberation and Separation for Arbitrary Number of Sources Based on Bayesian Nonparametrics
Otsuka, Takuma
Ishiguro, Katsuhiko
Yoshioka, Takuya
Sawada, Hiroshi
Okuno, Hiroshi G.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (12) : 2218 - 2232
[22] Research on heart and lung sound separation method based on DAE-NMF-VMD
Sun, Wenhui
Zhang, Yipeng
Chen, Fuming
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2024, 2024 (01)
[23] DOA Estimation of Indoor Sound Sources Based on Spherical Harmonic Domain Beam-Space MUSIC
Weng, Liuqing
Song, Xiyu
Liu, Zhenghong
Liu, Xiaojuan
Zhou, Haocheng
Qiu, Hongbing
Wang, Mei
SYMMETRY-BASEL, 2023, 15 (01):
[24] Separation of Vibration-Derived Sound Signals Based on Fusion Processing of Vibration Sensors and Microphones
Takashima, Ryoichi
Kawaguchi, Yohei
Togami, Masahito
2017 25TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2017, : 2428 - 2432
[25] Clustering Algorithm for Unsupervised Monaural Musical Sound Separation Based on Non-negative Matrix Factorization
Park, Sang Ha
Lee, Seokjin
Sung, Koeng-Mo
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2012, E95A (04) : 818 - 823
[26] SOUND SOURCE SEPARATION BASED ON NON-NEGATIVE TENSOR FACTORIZATION INCORPORATING SPATIAL CUE AS PRIOR KNOWLEDGE
Mitsufuji, Yuki
Roebel, Axel
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 71 - 75
[27] A multi-channel UNet framework based on SNMF-DCNN for robust heart-lung-sound separation
Wang, Weibo
Qin, Dimei
Wang, Shubo
Fang, Yu
Zheng, Yongkang
COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 164
[28] AN ADAPTIVE TIME-FREQUENCY RESOLUTION APPROACH FOR NON-NEGATIVE MATRIX FACTORIZATION BASED SINGLE CHANNEL SOUND SOURCE SEPARATION
Kirbiz, Serap
Smaragdis, Paris
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 253 - 256
[29] On the use of a spatial cue as prior information for stereo sound source separation based on spatially weighted non-negative tensor factorization
Mitsufuji, Yuki
Roebel, Axel
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2014,
[30] Multichannel Blind Source Separation Based on Evanescent-Region-Aware Non-Negative Tensor Factorization in Spherical Harmonic Domain
Mitsufuji, Yuki
Takamune, Norihiro
Koyama, Shoichi
Saruwatari, Hiroshi
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 607 - 617

← 1 2 3 4 →