An Ensemble of Convolutional Neural Networks for Audio Classification

被引:60
作者
Nanni, Loris [1 ]
Maguolo, Gianluca [1 ]
Brahnam, Sheryl [2 ]
Paci, Michelangelo [3 ]
机构
[1] Univ Padua, Dept Informat Engn, I-35122 Padua, Italy
[2] Missouri State Univ, Dept Informat Technol & Cybersecur, Springfield, MO 65804 USA
[3] Tampere Univ, Fac Med & Hlth Technol, BioMediTech, Arvo Ylpon Katu 34, FI-33520 Tampere, Finland
来源
APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 13期
关键词
audio classification; data augmentation; ensemble of classifiers; pattern recognition; TIME-SCALE MODIFICATION; TEXTURE CLASSIFICATION; ACOUSTIC FEATURES; DATA AUGMENTATION;
D O I
10.3390/app11135796
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Research in sound classification and recognition is rapidly advancing in the field of pattern recognition. One important area in this field is environmental sound recognition, whether it concerns the identification of endangered species in different habitats or the type of interfering noise in urban environments. Since environmental audio datasets are often limited in size, a robust model able to perform well across different datasets is of strong research interest. In this paper, ensembles of classifiers are combined that exploit six data augmentation techniques and four signal representations for retraining five pre-trained convolutional neural networks (CNNs); these ensembles are tested on three freely available environmental audio benchmark datasets: (i) bird calls, (ii) cat sounds, and (iii) the Environmental Sound Classification (ESC-50) database for identifying sources of noise in environments. To the best of our knowledge, this is the most extensive study investigating ensembles of CNNs for audio classification. The best-performing ensembles are compared and shown to either outperform or perform comparatively to the best methods reported in the literature on these datasets, including on the challenging ESC-50 dataset. We obtained a 97% accuracy on the bird dataset, 90.51% on the cat dataset, and 88.65% on ESC-50 using different approaches. In addition, the same ensemble model trained on the three datasets managed to reach the same results on the bird and cat datasets while losing only 0.1% on ESC-50. Thus, we have managed to create an off-the-shelf ensemble that can be trained on different datasets and reach performances competitive with the state of the art.
引用
收藏
页数:18
相关论文
共 59 条
[1]  
Agrawal DM, 2017, EUR SIGNAL PR CONF, P1809, DOI 10.23919/EUSIPCO.2017.8081521
[2]  
[Anonymous], 2010, THEORY APPL DIGITAL
[4]  
Brahnam S., 2014, Local Binary Patterns - New Variants and Applications
[5]  
Cao Zheng., 2015, OCEANS15 MTS/IEEE Washington, P1, DOI [10.23919/OCEANS.2015.7404375, DOI 10.23919/OCEANS.2015.7404375]
[6]  
Driedger J., 2014, P 17 INT C DIG AUD E P 17 INT C DIG AUD E
[7]   Improving Time-Scale Modification of Music Signals Using Harmonic-Percussive Separation [J].
Driedger, Jonathan ;
Mueller, Meinard ;
Ewert, Sebastian .
IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (01) :105-109
[8]   Fault diagnosis of angle grinders and electric impact drills using acoustic signals [J].
Glowacz, Adam ;
Tadeusiewicz, Ryszard ;
Legutko, Stanislaw ;
Caesarendra, Wahyu ;
Irfan, Muhammad ;
Liu, Hui ;
Brumercik, Frantisek ;
Gutten, Miroslav ;
Sulowicz, Maciej ;
Antonino Daviu, Jose Alfonso ;
Sarkodie-Gyan, Thompson ;
Fracz, Pawel ;
Kumar, Anil ;
Xiang, Jiawei .
APPLIED ACOUSTICS, 2021, 179 (179)
[9]   NEURAL NETWORK ENSEMBLES [J].
HANSEN, LK ;
SALAMON, P .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1990, 12 (10) :993-1001
[10]  
Harjoseputro Y., 2020, International Journal on Advanced Science, Engineering and Information Technology, V10, P2290, DOI [DOI 10.18517/IJASEIT.10.6.10948, 10.18517/ijaseit.10.6.10948]