A CNN Sound Classification Mechanism Using Data Augmentation

被引:11
作者
Chu, Hung-Chi [1 ]
Zhang, Young-Lin [1 ]
Chiang, Hao-Chu [1 ]
机构
[1] Chaoyang Univ Technol, Dept Informat & Commun Engn, Taichung 41349, Taiwan
关键词
sound classification; signal processing; CNN; CONVOLUTIONAL NEURAL-NETWORKS;
D O I
10.3390/s23156972
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Sound classification has been widely used in many fields. Unlike traditional signal-processing methods, using deep learning technology for sound classification is one of the most feasible and effective methods. However, limited by the quality of the training dataset, such as cost and resource constraints, data imbalance, and data annotation issues, the classification performance is affected. Therefore, we propose a sound classification mechanism based on convolutional neural networks and use the sound feature extraction method of Mel-Frequency Cepstral Coefficients (MFCCs) to convert sound signals into spectrograms. Spectrograms are suitable as input for CNN models. To provide the function of data augmentation, we can increase the number of spectrograms by setting the number of triangular bandpass filters. The experimental results show that there are 50 semantic categories in the ESC-50 dataset, the types are complex, and the amount of data is insufficient, resulting in a classification accuracy of only 63%. When using the proposed data augmentation method (K = 5), the accuracy is effectively increased to 97%. Furthermore, in the UrbanSound8K dataset, the amount of data is sufficient, so the classification accuracy can reach 90%, and the classification accuracy can be slightly increased to 92% via data augmentation. However, when only 50% of the training dataset is used, along with data augmentation, the establishment of the training model can be accelerated, and the classification accuracy can reach 91%.
引用
收藏
页数:18
相关论文
共 32 条
[1]   Mel Frequency Cepstral Coefficient and its Applications: A Review [J].
Abdul, Zrar Kh. ;
Al-Talabani, Abdulbasit K. K. .
IEEE ACCESS, 2022, 10 :122136-122158
[2]  
Chi ZJ, 2019, PROCEEDINGS OF 2019 IEEE 7TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2019), P251, DOI [10.1109/iccsnt47585.2019.8962462, 10.1109/ICCSNT47585.2019.8962462]
[3]   The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation [J].
Chicco, Davide ;
Jurman, Giuseppe .
BMC GENOMICS, 2020, 21 (01)
[4]   Urban Sound Classification Using Convolutional Neural Network and Long Short Term Memory Based on Multiple Features [J].
Das, Joy Krishan ;
Ghosh, Arka ;
Pal, Abhijit Kumar ;
Dutta, Sumit ;
Chakrabarty, Amitabha .
2020 FOURTH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING IN DATA SCIENCES (ICDS), 2020,
[5]  
Eric W., 2014, FOURIER TRANSFORMS P
[6]  
Franzese M, 2019, ENCY BIOINFORMATICS, P753, DOI [10.1016/B978-0-12-809633-8.20488-3, DOI 10.1016/B978-0-12-809633-8.20488-3]
[7]  
García-Balboa JL, 2018, INT GEOSCI REMOTE SE, P1203, DOI 10.1109/IGARSS.2018.8517924
[8]   Environmental sound monitoring using machine learning on mobile devices [J].
Green, Marc ;
Murphy, Damian .
APPLIED ACOUSTICS, 2020, 159
[9]   Location Estimation of Predominant Sound Source with Embedded Source Separation in Amplitude-Panned Stereo Signal [J].
Han, Taek-Jin ;
Kim, Ki-Jun ;
Park, Hochong .
IEEE SIGNAL PROCESSING LETTERS, 2015, 22 (10) :1685-1688
[10]   ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context [J].
Han, Wei ;
Zhang, Zhengdong ;
Zhang, Yu ;
Yu, Jiahui ;
Chiu, Chung-Cheng ;
Qin, James ;
Gulati, Anmol ;
Pang, Ruoming ;
Wu, Yonghui .
INTERSPEECH 2020, 2020, :3610-3614