A new pyramidal concatenated CNN approach for environmental sound classification

被引:72
作者
Demir, Fatih [1 ]
Turkoglu, Muammer [2 ]
Aslan, Muzaffer [3 ]
Sengur, Abdulkadir [1 ]
机构
[1] Firat Univ, Elect & Elect Engn Dept, TR-23000 Elazig, Turkey
[2] Bingol Univ, Comp Engn Dept, TR-12000 Bingol, Turkey
[3] Bingol Univ, Elect & Elect Engn Dept, TR-12000 Bingol, Turkey
关键词
Sound classification; Deep learning; SVM; STFT; CNN;
D O I
10.1016/j.apacoust.2020.107520
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recently, there has been an incremental interest on Environmental Sound Classification (ESC), which is an important topic of the non-speech audio classification task. A novel approach, which is based on deep Convolutional Neural Networks (CNN), is proposed in this study. The proposed approach covers a bunch of stages such as pre-processing, deep learning based feature extraction, feature concatenation, feature reduction and classification, respectively. In the first stage, the input sound signals are denoised and are converted into sound images by using the Sort Time Fourier Transform (STFT) method. After sound images are formed, pre-trained CNN models are used for deep feature extraction. In this stage, VGG16, VGG19 and DenseNet201 models are considered. The feature extraction is performed in a pyramidal fashion which makes the dimension of the feature vector quite large. For both dimension reduction and the determination of the most efficient features, a feature selection mechanism is considered after feature concatenation stage. In the last stage of the proposed method, a Support Vector Machines (SVM) classifier is used. The efficiency of the proposed method is calculated on various ESC datasets such as ESC 10, ESC 50 and UrbanSound8K, respectively. The experimental works show that the proposed method produced 94.8%, 81.4% and 78.14% accuracy scores for ESC-10, ESC-50 and UrbanSound8K datasets. The obtained results are also compared with the state-of-the art methods achievements. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:7
相关论文
共 31 条
  • [1] [Anonymous], 2017, P DCASE
  • [2] [Anonymous], 2018, IEEE ACCESS
  • [3] Aytar Y, ADV NEURAL INFORM PR, P892
  • [4] Deep features-based speech emotion recognition for smart affective services
    Badshah, Abdul Malik
    Rahim, Nasir
    Ullah, Noor
    Ahmad, Jamil
    Muhammad, Khan
    Lee, Mi Young
    Kwon, Soonil
    Baik, Sung Wook
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (05) : 5571 - 5589
  • [5] BUDAK H, 2018, Suleyman Demirel Universitesi Fen Bilimleri Enstitusu Dergisi, V22, P21, DOI [10.19113/sdufbed.01653, DOI 10.19113/SDUFBED.01653]
  • [6] Modeling of tensile strength of rocks materials based on support vector machines approaches
    Ceryan, Nurcihan
    Okkan, Umut
    Samui, Pijush
    Ceryan, Sener
    [J]. INTERNATIONAL JOURNAL FOR NUMERICAL AND ANALYTICAL METHODS IN GEOMECHANICS, 2013, 37 (16) : 2655 - 2670
  • [7] Chen X, ARXIV170502743
  • [8] Environmental sound classification with dilated convolutions
    Chen, Yan
    Guo, Qian
    Liang, Xinyan
    Wang, Jiang
    Qian, Yuhua
    [J]. APPLIED ACOUSTICS, 2019, 148 : 123 - 132
  • [9] Environmental Sound Recognition With Time-Frequency Audio Features
    Chu, Selina
    Narayanan, Shrikanth
    Kuo, C. -C. Jay
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (06): : 1142 - 1158
  • [10] Huang G., 2017, Computer Vision and Pattern Recognition CVPR, P4700, DOI [DOI 10.1109/CVPR.2017.243, 10.1109/CVPR.2017.243]