Multi-Level Local Feature Coding Fusion for Music Genre Recognition

被引:18
作者
Ng, Wing W. Y. [1 ]
Zeng, Weijie [1 ]
Wang, Ting [1 ]
机构
[1] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou 510006, Peoples R China
基金
中国国家自然科学基金;
关键词
Music genre recognition; NetVLAD; self-attention; convolutional neural network; representation learning; CONVOLUTIONAL NEURAL-NETWORKS; ACOUSTIC FEATURES; CLASSIFICATION;
D O I
10.1109/ACCESS.2020.3017661
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Music genre recognition (MGR) plays a fundamental role in the context of music indexing and retrieval. Unlike images, music genres consist of immediate characteristics that are highly diversified with abstractions in different levels. However, most representation learning methods for MGR focus on global features and make decisions from features in the same level. To remedy such defects, we intergrate a convolutional neural network (CNN) with NetVLAD and self-attention to capture the local information across levels and learn their long-term dependencies. A meta classifier is used to make the final MGR classification by learning from aggregated high-level features from different local feature coding networks. Experimental results show that the proposed approach yields higher accuracies than other state-of-the-art models on GTZAN, ISMIR2004, and Extended Ballroom dataset.
引用
收藏
页码:152713 / 152727
页数:15
相关论文
共 49 条
[1]   Deep Scattering Spectrum [J].
Anden, Joakim ;
Mallat, Stephane .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2014, 62 (16) :4114-4128
[2]  
[Anonymous], 2020, NEUROCOMPUTING, DOI DOI 10.1016/J.NEUCOM.2019.09.054
[3]  
[Anonymous], 2017, P 2017 ACM SIGCS, DOI DOI 10.1145/3017680.3017713
[4]  
[Anonymous], 2016, EXPERT SYST APPL, DOI DOI 10.1016/J.ESWA.2016.04.008
[5]  
[Anonymous], 2011, IEEE I CONF COMP VIS
[6]  
[Anonymous], 2019, EXPERT SYST
[7]  
Arandjelovic R, 2018, IEEE T PATTERN ANAL, V40, P1437, DOI [10.1109/TPAMI.2017.2711011, 10.1109/CVPR.2016.572]
[8]   Non-Negative Tensor Factorization Applied to Music Genre Classification [J].
Benetos, Emmanouil ;
Kotropoulos, Constantine .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (08) :1955-1967
[9]  
Bertin-Mahieux T., 2011, P INT SOC MUS INF RE, P591
[10]   An evaluation of Convolutional Neural Networks for music classification using spectrograms [J].
Costa, Yandre M. G. ;
Oliveira, Luiz S. ;
Silla, Carlos N., Jr. .
APPLIED SOFT COMPUTING, 2017, 52 :28-38