A Lightweight Channel and Time Attention Enhanced 1D CNN Model for Environmental Sound Classification

被引:10
作者
Xu, Huaxing [1 ]
Tian, Yunzhi [1 ]
Ren, Haichuan [1 ]
Liu, Xudong [1 ]
机构
[1] Zhengzhou Univ, Sch Elect & Informat Engn, Zhengzhou 450001, Henan, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Environmental sound classification; 1D CNN; Attention; Snapshot ensemble; NEURAL-NETWORKS; AUDIO; RECOGNITION;
D O I
10.1016/j.eswa.2024.123768
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One dimension convolutional neural networks (1D CNN) that directly take raw waveforms as input has less competition than 2D CNN recognizing environmental sound. In order to overcome its disadvantages, we propose a novel lightweight 1D CNN structure by employing attention mechanism, which has significant improvement in both accuracy and computational complexity. Concretely, (1) two attention modules are constructed along channel and time dimension separately, and combined to give an intermediate feature map, which focus on key frequency band and semantically related time frame information. (2) Without increasing training overhead, snapshot ensemble is employed to further improve performance. Results from two benchmarking datasets (UrbanSound8k, ESC -10) demonstrated that: by employing attention mechanism, our model outperforms all of the previously reported 1D CNN approaches in accuracy with less parameters. Meanwhile with improved performance gain, the proposed model is superior than most of the existing spectral-based 2D CNN approaches and competitive with SOTA performance, while with orders of magnitude parameters fewer. Overall, it indicates our model is compact and has good potential in practical resource-limited applications, such as sound recognition on embedded platform.
引用
收藏
页数:10
相关论文
共 75 条
  • [1] End-to-end environmental sound classification using a 1D convolutional neural network
    Abdoli, Sajjad
    Cardinal, Patrick
    Koerich, Alessandro Lameiras
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2019, 136 : 252 - 263
  • [2] A novel acoustic scene classification model using the late fusion of convolutional neural networks and different ensemble classifiers
    Alamir, Mahmoud A.
    [J]. APPLIED ACOUSTICS, 2021, 175
  • [3] [Anonymous], 2018, Proc. Detection and Classification of Acoustic Scenes and Events (DCASE) Workshop
  • [4] Atito S, 2024, Arxiv, DOI arXiv:2211.13189
  • [5] The bag-of-frames approach to audio pattern recognition: A sufficient model for urban soundscapes but not for polyphonic music
    Aucouturier, Jean-Julien
    Defreville, Boris
    Pachet, Francois
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2007, 122 (02) : 881 - 891
  • [6] Aytar Y, 2016, ADV NEUR IN, V29
  • [7] CNN-RNN and Data Augmentation Using Deep Convolutional Generative Adversarial Network for Environmental Sound Classification
    Bahmei, Behnaz
    Birmingham, Elina
    Arzanpour, Siamak
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 682 - 686
  • [8] TimeScaleNet: A Multiresolution Approach for Raw Audio Recognition Using Learnable Biquadratic IIR Filters and Residual Networks of Depthwise-Separable One-Dimensional Atrous Convolutions
    Bavu, Eric
    Ramamonjy, Aro
    Pujol, Hadrien
    Garcia, Alexandre
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (02) : 220 - 235
  • [9] Feature Learning With Matrix Factorization Applied to Acoustic Scene Classification
    Bisot, Victor
    Serizel, Romain
    Essid, Slim
    Richard, Gael
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (06) : 1216 - 1229
  • [10] Classifying environmental sounds using image recognition networks
    Boddapati, Venkatesh
    Petef, Andrej
    Rasmusson, Jim
    Lundberg, Lars
    [J]. KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS, 2017, 112 : 2048 - 2056