Clustering Algorithm for Unsupervised Monaural Musical Sound Separation Based on Non-negative Matrix Factorization

被引:2
作者
Park, Sang Ha [1 ]
Lee, Seokjin [1 ]
Sung, Koeng-Mo [1 ]
机构
[1] Seoul Natl Univ, INMC, Seoul 151742, South Korea
关键词
non-negative matrix factorization; clustering; musical sound source separation;
D O I
10.1587/transfun.E95.A.818
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Non-negative matrix factorization (NMF) is widely used for monaural musical sound source separation because of its efficiency and good performance. However, an additional clustering process is required because the musical sound mixture is separated into more signals than the number of musical tracks during NMF separation. In the conventional method, manual clustering or training-based clustering is performed with an additional learning process. Recently, a clustering algorithm based on the mel-frequency cepstrum coefficient (MFCC) was proposed for unsupervised clustering. However, MFCC clustering supplies limited information for clustering. In this paper, we propose various timbre features for unsupervised clustering and a clustering algorithm with these features. Simulation experiments are carried out using various musical sound mixtures. The results indicate that the proposed method improves clustering performance, as compared to conventional MFCC-based clustering.
引用
收藏
页码:818 / 823
页数:6
相关论文
共 17 条
[1]   PERCEPTUAL LINEAR PREDICTIVE (PLP) ANALYSIS OF SPEECH [J].
HERMANSKY, H .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1990, 87 (04) :1738-1752
[2]  
Jaiswal R., 2011, IEEE INT C AC SPEECH
[3]  
Jaiswal R., 2011, ISSC
[4]  
Kirbiz S., 2010, 18 EUR SIGN PROC C E
[5]   Learning the parts of objects by non-negative matrix factorization [J].
Lee, DD ;
Seung, HS .
NATURE, 1999, 401 (6755) :788-791
[6]  
Lee DD, 2001, ADV NEUR IN, V13, P556
[7]  
Russell S., 2010, Artificial Intelligence: A Modern Approach, V3rd
[8]   Specmurt analysis of polyphonic music signals [J].
Saito, Shoichiro ;
Kameoka, Hirokazu ;
Takahashi, Keigo ;
Nishimoto, Takuya ;
Sagayama, Shigeki .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (03) :639-650
[9]  
Spiertz Martin, 2009, P 12 INT C DIG AUD E
[10]  
Uhle C., 2003, P 4 INT S IND COMP A