A new pyramidal concatenated CNN approach for environmental sound classification

被引:77
作者
Demir, Fatih [1 ]
Turkoglu, Muammer [2 ]
Aslan, Muzaffer [3 ]
Sengur, Abdulkadir [1 ]
机构
[1] Firat Univ, Elect & Elect Engn Dept, TR-23000 Elazig, Turkey
[2] Bingol Univ, Comp Engn Dept, TR-12000 Bingol, Turkey
[3] Bingol Univ, Elect & Elect Engn Dept, TR-12000 Bingol, Turkey
关键词
Sound classification; Deep learning; SVM; STFT; CNN;
D O I
10.1016/j.apacoust.2020.107520
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recently, there has been an incremental interest on Environmental Sound Classification (ESC), which is an important topic of the non-speech audio classification task. A novel approach, which is based on deep Convolutional Neural Networks (CNN), is proposed in this study. The proposed approach covers a bunch of stages such as pre-processing, deep learning based feature extraction, feature concatenation, feature reduction and classification, respectively. In the first stage, the input sound signals are denoised and are converted into sound images by using the Sort Time Fourier Transform (STFT) method. After sound images are formed, pre-trained CNN models are used for deep feature extraction. In this stage, VGG16, VGG19 and DenseNet201 models are considered. The feature extraction is performed in a pyramidal fashion which makes the dimension of the feature vector quite large. For both dimension reduction and the determination of the most efficient features, a feature selection mechanism is considered after feature concatenation stage. In the last stage of the proposed method, a Support Vector Machines (SVM) classifier is used. The efficiency of the proposed method is calculated on various ESC datasets such as ESC 10, ESC 50 and UrbanSound8K, respectively. The experimental works show that the proposed method produced 94.8%, 81.4% and 78.14% accuracy scores for ESC-10, ESC-50 and UrbanSound8K datasets. The obtained results are also compared with the state-of-the art methods achievements. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:7
相关论文
共 31 条
[1]  
[Anonymous], 2018, IEEE ACCESS
[2]  
Aytar Y, ADV NEURAL INFORM PR, P892
[3]   Deep features-based speech emotion recognition for smart affective services [J].
Badshah, Abdul Malik ;
Rahim, Nasir ;
Ullah, Noor ;
Ahmad, Jamil ;
Muhammad, Khan ;
Lee, Mi Young ;
Kwon, Soonil ;
Baik, Sung Wook .
MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (05) :5571-5589
[4]  
Budak H., 2018, Suleyman Demirel Universitesi fen Bilimleri Enstits Dergisi, V22, P21, DOI [10.19113/sdufbed.01653, DOI 10.19113/SDUFBED.01653]
[5]   Modeling of tensile strength of rocks materials based on support vector machines approaches [J].
Ceryan, Nurcihan ;
Okkan, Umut ;
Samui, Pijush ;
Ceryan, Sener .
INTERNATIONAL JOURNAL FOR NUMERICAL AND ANALYTICAL METHODS IN GEOMECHANICS, 2013, 37 (16) :2655-2670
[6]  
Chen X, ARXIV170502743
[7]   Environmental sound classification with dilated convolutions [J].
Chen, Yan ;
Guo, Qian ;
Liang, Xinyan ;
Wang, Jiang ;
Qian, Yuhua .
APPLIED ACOUSTICS, 2019, 148 :123-132
[8]   Environmental Sound Recognition With Time-Frequency Audio Features [J].
Chu, Selina ;
Narayanan, Shrikanth ;
Kuo, C. -C. Jay .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (06) :1142-1158
[9]  
Huang G., 2017, P IEEE C COMP VIS PA, P4700, DOI DOI 10.1109/CVPR.2017.243
[10]  
Jain Divya, 2018, Procedia Computer Science, V132, P333, DOI 10.1016/j.procs.2018.05.188