Automatic Music Genre Classification Using Timbral Texture and Rhythmic Content Features

被引：0

作者：

Baniya, Babu Kaji ^{[1
]}

Ghimire, Deepak ^{[1
]}

Lee, Joonwhoan ^{[1
]}

机构：

[1] Chonbuk Natl Univ, Div Comp Sci & Engn, Jeonju 761756, South Korea

来源：

2015 17TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT) | 2015年

关键词：

Classification; music genres; ELM (Extreme Learning Machine) with bagging; covariance matrix; timbral texture; rhythmic contents;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Music genre classification is a vital component for the music information retrieval system. There are two important components to be considered for better genre classification, which are audio feature extraction and classifier. This paper incorporates two different kinds of features for genre classification, timbral texture and rhythmic content features. Timbral texture contains the Mel-frequency Cepstral Coefficient ( MFCC) with other several spectral features. Before choosing a timbral feature we explore which feature contributes a less significant role on genre discrimination. This facilitates the reduction of feature dimension. For the timbral features up to the 4-th order central moments and the covariance components of mutual features are considered to improve the overall classification result. For the rhythmic content the features extracted from beat histogram are selected. In the paper Extreme Learning Machine ( ELM) with bagging is used as the classifier for classifying the genres. Based on the proposed feature sets and classifier, experiments are performed with two well-known datasets: GTZAN and the ISMIR2004 databases with ten and six different music genres, respectively. The proposed method acquires better and competitive classification accuracy compared to the existing approaches for both data sets.

引用

页数：10

共 26 条

[1]

[Anonymous], 2000, P 1 INT S MUS INF RE

[2]

Baniya B. K., 2013, P IEEE WORKSH SIGN P

[3]

Baniya BK, 2014, INT CONF ADV COMMUN, P96, DOI 10.1109/ICACT.2014.6778929

[4] Aggregate features and ADABOOST for music classification [J].

Bergstra, James ;

Casagrande, Norman ;

Erhan, Dumitru ;

Eck, Douglas ;

Kegl, Balazs .

MACHINE LEARNING, 2006, 65 (2-3) :473-484

[5] Bagging predictors [J].

Breiman, L .

MACHINE LEARNING, 1996, 24 (02) :123-140

[6] ORTHONORMAL BASES OF COMPACTLY SUPPORTED WAVELETS [J].

DAUBECHIES, I .

COMMUNICATIONS ON PURE AND APPLIED MATHEMATICS, 1988, 41 (07) :909-996

[7]

Deshpande H, 2001, P COST G6 C DIG AUD, P1

[8] An experimental comparison of audio tempo induction algorithms [J].

Gouyon, Fabien ;

Klapuri, Anssi ;

Dixon, Simon ;

Alonso, Miguel ;

Tzanetakis, George ;

Uhle, Christian ;

Cano, Pedro .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (05) :1832-1844

[9] MEASURING SKEWNESS AND KURTOSIS [J].

GROENEVELD, RA ;

MEEDEN, G .

STATISTICIAN, 1984, 33 (04) :391-399

[10] APPROXIMATION CAPABILITIES OF MULTILAYER FEEDFORWARD NETWORKS [J].

HORNIK, K .

NEURAL NETWORKS, 1991, 4 (02) :251-257

← 1 2 3 →