A novel spiral pattern and 2D M4 pooling based environmental sound classification method

被引:11
作者
Tuncer, Turker [1 ]
Subasi, Abdulhamit [2 ]
Ertam, Fatih [1 ]
Dogan, Sengul [1 ]
机构
[1] Firat Univ, Technol Fac, Dept Digital Forens Engn, Elazig, Turkey
[2] Effat Univ, Coll Engn, Dept Informat Syst, Jeddah, Saudi Arabia
关键词
Environmental sound classification; Spiral pattern; 2D M4 pooling; Deep neural network; Machine learning; Digital forensics; CONVOLUTIONAL NEURAL-NETWORKS; RECOGNITION; BINARY; SYSTEM;
D O I
10.1016/j.apacoust.2020.107508
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
One of the crucial problems of the signal processing, digital forensics and machine learning is the environmental sound classification (ESC). Several ESC methods have been presented to obtain highly accurate model. In this work, a novel multileveled ESC method is presented. The presented ESC method uses two novel algorithms namely Spiral Pattern and two dimensional maximum, minimum, median and mean (2D-M4) pooling. By using these methods (Spiral Pattern and 2D-M4 pooling), 9 level feature generation approach is presented. Since the proposed Spiral Pattern has nine arrows, it extracts 9 and 18 bits using signum and ternary functions respectively. As a result, 1536 features are extracted in each level and totally 15,360 features are generated using from 0th to 9th levels. In order to select the discriminative features, neighbourhood component analysis (NCA) is used and 700 most distinctive features are selected. In the classification phase, deep neural network is trained and tested with the ESC-10 and ESC-50 datasets. 98.75% and 85.75% average classification accuracies were achieved with 10-folds cross validation for ESC-10 and ESC-50 datasets respectively. The experimental results reveal that the proposed Spiral Pattern and 2D-M4 pooling based ESC method is superior than the human auditory system (HAS) for environmental sound classification. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:11
相关论文
共 53 条
[41]   Detection and Classification of Acoustic Scenes and Events [J].
Stowell, Dan ;
Giannoulis, Dimitrios ;
Benetos, Emmanouil ;
Lagrange, Mathieu ;
Plumbley, Mark D. .
IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (10) :1733-1746
[42]  
Tan XY, 2007, LECT NOTES COMPUT SC, V4778, P168
[43]  
Tokozume Y., 2017, ARXIV171110282
[44]  
Tokozume Y, 2017, INT CONF ACOUST SPEE, P2721, DOI 10.1109/ICASSP.2017.7952651
[45]   Novel dynamic center based binary and ternary pattern network using M4 pooling for real world voice recognition [J].
Tuncer, Turker ;
Dogan, Sengul .
APPLIED ACOUSTICS, 2019, 156 :176-185
[46]   ASSESSMENT FOR AUTOMATIC SPEECH RECOGNITION .2. NOISEX-92 - A DATABASE AND AN EXPERIMENT TO STUDY THE EFFECT OF ADDITIVE NOISE ON SPEECH RECOGNITION SYSTEMS [J].
VARGA, A ;
STEENEKEN, HJM .
SPEECH COMMUNICATION, 1993, 12 (03) :247-251
[47]   Rotation and gray-scale transform-invariant texture classification using spiral resampling, subband decomposition, and hidden Markov model [J].
Wu, WR ;
Wei, SC .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 1996, 5 (10) :1423-1434
[48]   Fast neighborhood component analysis [J].
Yang, Wei ;
Wang, Kuanquan ;
Zuo, Wangmeng .
NEUROCOMPUTING, 2012, 83 :31-37
[49]   Neighborhood Component Feature Selection for High-Dimensional Data [J].
Yang, Wei ;
Wang, Kuanquan ;
Zuo, Wangmeng .
JOURNAL OF COMPUTERS, 2012, 7 (01) :161-168
[50]  
YOSINSKI J, 2014, ADV NEURAL INFORM PR, P3320, DOI DOI 10.48550/ARXIV.1411.1792