Improved Music Genre Classification with Convolutional Neural Networks

被引：46

作者：

Zhang, Weibin ^{[1
]}

Lei, Wenkang ^{[1
]}

Xu, Xiangmin ^{[1
]}

Xing, Xiaofeng ^{[1
]}

机构：

[1] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou, Guangdong, Peoples R China

来源：

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年

关键词：

music genre classification; convolutional neural network; residual learning;

D O I：

10.21437/Interspeech.2016-1236

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In recent years, deep neural networks have been shown to be effective in many classification tasks, including music genre classification. In this paper, we proposed two ways to improve music genre classification with convolutional neural networks: 1) combining max- and average pooling to provide more statistical information to higher level neural networks; 2) using shortcut connections to skip one or more layers, a method inspired by residual learning method. The input of the CNN is simply the short time Fourier transforms of the audio signal. The output of the CNN is fed into another deep neural network to do classification. By comparing two different network topologies, our preliminary experimental results on the GTZAN data set show that the above two methods can effectively improve the classification accuracy, especially the second one.

引用

页码：3304 / 3308

页数：5

共 22 条

[1]

Alexandridis Alex, 2014, 2014 6th Computer Science and Electronic Engineering Conference (CEEC). Proceedings, P35, DOI 10.1109/CEEC.2014.6958551

[2]

[Anonymous], 2010, P INT C DAT MIN APPL

[3]

[Anonymous], 2013, 14th International Society for Music Information Retrieval Conference (ISMIR-2013)

[4]

Auguin N., 2013, SIGN INF PROC ASS AN, P1

[5]

Baniya BK, 2014, INT CONF ADV COMMUN, P96, DOI 10.1109/ICACT.2014.6778929

[6] Aggregate features and ADABOOST for music classification [J].

Bergstra, James ;

Casagrande, Norman ;

Erhan, Dumitru ;

Eck, Douglas ;

Kegl, Balazs .

MACHINE LEARNING, 2006, 65 (2-3) :473-484

[7]

Chiyuan Zhang, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P6984, DOI 10.1109/ICASSP.2014.6854954

[8]

Dieleman Sander, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P6964, DOI 10.1109/ICASSP.2014.6854950

[9]

Glorot X., 2011, P 14 INT C ARTIFICIA, P315

[10]

He K., 2016, P IEEE C COMPUTER VI, P770, DOI DOI 10.1109/CVPR.2016.90

← 1 2 3 →