Improved Music Genre Classification with Convolutional Neural Networks

被引:46
作者
Zhang, Weibin [1 ]
Lei, Wenkang [1 ]
Xu, Xiangmin [1 ]
Xing, Xiaofeng [1 ]
机构
[1] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou, Guangdong, Peoples R China
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
关键词
music genre classification; convolutional neural network; residual learning;
D O I
10.21437/Interspeech.2016-1236
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In recent years, deep neural networks have been shown to be effective in many classification tasks, including music genre classification. In this paper, we proposed two ways to improve music genre classification with convolutional neural networks: 1) combining max- and average pooling to provide more statistical information to higher level neural networks; 2) using shortcut connections to skip one or more layers, a method inspired by residual learning method. The input of the CNN is simply the short time Fourier transforms of the audio signal. The output of the CNN is fed into another deep neural network to do classification. By comparing two different network topologies, our preliminary experimental results on the GTZAN data set show that the above two methods can effectively improve the classification accuracy, especially the second one.
引用
收藏
页码:3304 / 3308
页数:5
相关论文
共 22 条
[1]  
Alexandridis Alex, 2014, 2014 6th Computer Science and Electronic Engineering Conference (CEEC). Proceedings, P35, DOI 10.1109/CEEC.2014.6958551
[2]  
[Anonymous], 2010, P INT C DAT MIN APPL
[3]  
[Anonymous], 2013, 14th International Society for Music Information Retrieval Conference (ISMIR-2013)
[4]  
Auguin N., 2013, SIGN INF PROC ASS AN, P1
[5]  
Baniya BK, 2014, INT CONF ADV COMMUN, P96, DOI 10.1109/ICACT.2014.6778929
[6]   Aggregate features and ADABOOST for music classification [J].
Bergstra, James ;
Casagrande, Norman ;
Erhan, Dumitru ;
Eck, Douglas ;
Kegl, Balazs .
MACHINE LEARNING, 2006, 65 (2-3) :473-484
[7]  
Chiyuan Zhang, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P6984, DOI 10.1109/ICASSP.2014.6854954
[8]  
Dieleman Sander, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P6964, DOI 10.1109/ICASSP.2014.6854950
[9]  
Glorot X., 2011, P 14 INT C ARTIFICIA, P315
[10]  
He K., 2016, P IEEE C COMPUTER VI, P770, DOI DOI 10.1109/CVPR.2016.90