Combining visual and acoustic features for audio classification tasks

被引:62
作者
Nanni, L. [1 ]
Costa, Y. M. G. [2 ]
Lucio, D. R. [2 ]
Silla, C. N., Jr. [3 ]
Brahnam, S. [4 ]
机构
[1] Univ Padua, DEI, I-35100 Padua, Italy
[2] Univ Estadual Maringa, PCC DIN, Maringa, Parana, Brazil
[3] Pontificia Univ Catolica Parana, PPGIa, Curitiba, Parana, Brazil
[4] Missouri State Univ, CIS, Springfield, MO USA
关键词
Audio classification; Texture; Image processing; Acoustic features; Ensemble of classifiers; Pattern recognition; GENRE CLASSIFICATION;
D O I
10.1016/j.patrec.2017.01.013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper a novel and effective approach for automated audio classification is presented that is based on the fusion of different sets of features, both visual and acoustic. A number of different acoustic and visual features of sounds are evaluated and compared. These features are then fused in an ensemble that produces better classification accuracy than other state-of-the-art approaches. The visual features of sounds are built starting from the audio file and are taken from images constructed from different spectrograms, a gammatonegram, and a rhythm image. These images are divided into sub windows from which a set of texture descriptors are extracted. For each feature descriptor a different Support Vector Machine (SVM) is trained. The SVMs outputs are summed for a final decision. The proposed ensemble is evaluated on three well-known databases of music genre classification (the Latin Music Database, the ISMIR 2004 database, and the GTZAN genre collection), a dataset of Bird vocalization aiming specie recognition, and a dataset of right whale calls aiming whale detection. The MAT LAB code for the ensemble of classifiers and for the extraction of the features will be publicly available (https://www.deLunipclit/node/2357 +Pattern Recognition and Ensemble Classifiers). (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:49 / 56
页数:8
相关论文
共 44 条
[1]  
[Anonymous], 2014, P INT C COMP GRAPH V
[2]  
[Anonymous], 2006, TECHNICAL REPORT
[3]  
[Anonymous], 2013, IB C PATT REC
[4]  
Biagio S.M., 2013, IEEE INT C COMP VIS, V2, P809
[5]  
Cao C., 2009, MIREX INT C MUS INF
[6]  
Costa CHL, 2004, IEEE SYS MAN CYBERN, P562
[7]   Music genre classification using LBP textural features [J].
Costa, Y. M. G. ;
Oliveira, L. S. ;
Koerich, A. L. ;
Gouyon, F. ;
Martins, J. G. .
SIGNAL PROCESSING, 2012, 92 (11) :2723-2737
[8]  
Costa Y.M.G., 2012, INT JOINT C NEUR NET, P1867
[9]  
Costa Y. M. G., 2011, 2011 18th International Conference on Systems, Signals and Image Processing, P1
[10]  
Ellis D. P., 2009, Gammatone-Like Spectrograms