A Lightweight Channel and Time Attention Enhanced 1D CNN Model for Environmental Sound Classification

被引:15
作者
Xu, Huaxing [1 ]
Tian, Yunzhi [1 ]
Ren, Haichuan [1 ]
Liu, Xudong [1 ]
机构
[1] Zhengzhou Univ, Sch Elect & Informat Engn, Zhengzhou 450001, Henan, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Environmental sound classification; 1D CNN; Attention; Snapshot ensemble; NEURAL-NETWORKS; AUDIO; RECOGNITION;
D O I
10.1016/j.eswa.2024.123768
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One dimension convolutional neural networks (1D CNN) that directly take raw waveforms as input has less competition than 2D CNN recognizing environmental sound. In order to overcome its disadvantages, we propose a novel lightweight 1D CNN structure by employing attention mechanism, which has significant improvement in both accuracy and computational complexity. Concretely, (1) two attention modules are constructed along channel and time dimension separately, and combined to give an intermediate feature map, which focus on key frequency band and semantically related time frame information. (2) Without increasing training overhead, snapshot ensemble is employed to further improve performance. Results from two benchmarking datasets (UrbanSound8k, ESC -10) demonstrated that: by employing attention mechanism, our model outperforms all of the previously reported 1D CNN approaches in accuracy with less parameters. Meanwhile with improved performance gain, the proposed model is superior than most of the existing spectral-based 2D CNN approaches and competitive with SOTA performance, while with orders of magnitude parameters fewer. Overall, it indicates our model is compact and has good potential in practical resource-limited applications, such as sound recognition on embedded platform.
引用
收藏
页数:10
相关论文
共 75 条
[1]   End-to-end environmental sound classification using a 1D convolutional neural network [J].
Abdoli, Sajjad ;
Cardinal, Patrick ;
Koerich, Alessandro Lameiras .
EXPERT SYSTEMS WITH APPLICATIONS, 2019, 136 :252-263
[2]   A novel acoustic scene classification model using the late fusion of convolutional neural networks and different ensemble classifiers [J].
Alamir, Mahmoud A. .
APPLIED ACOUSTICS, 2021, 175
[3]  
[Anonymous], 2018, Computational analysis of sound scenes and events, DOI [10.1007/978-3-319-63450-0, DOI 10.1007/978-3-319-63450-0]
[4]  
Atito S, 2024, Arxiv, DOI arXiv:2211.13189
[5]   The bag-of-frames approach to audio pattern recognition: A sufficient model for urban soundscapes but not for polyphonic music [J].
Aucouturier, Jean-Julien ;
Defreville, Boris ;
Pachet, Francois .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2007, 122 (02) :881-891
[6]  
Aytar Y, 2016, ADV NEUR IN, V29
[7]   CNN-RNN and Data Augmentation Using Deep Convolutional Generative Adversarial Network for Environmental Sound Classification [J].
Bahmei, Behnaz ;
Birmingham, Elina ;
Arzanpour, Siamak .
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 :682-686
[8]   TimeScaleNet: A Multiresolution Approach for Raw Audio Recognition Using Learnable Biquadratic IIR Filters and Residual Networks of Depthwise-Separable One-Dimensional Atrous Convolutions [J].
Bavu, Eric ;
Ramamonjy, Aro ;
Pujol, Hadrien ;
Garcia, Alexandre .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (02) :220-235
[9]   Feature Learning With Matrix Factorization Applied to Acoustic Scene Classification [J].
Bisot, Victor ;
Serizel, Romain ;
Essid, Slim ;
Richard, Gael .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (06) :1216-1229
[10]   Classifying environmental sounds using image recognition networks [J].
Boddapati, Venkatesh ;
Petef, Andrej ;
Rasmusson, Jim ;
Lundberg, Lars .
KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS, 2017, 112 :2048-2056