Improved Patch-Mix Transformer and Contrastive Learning Method for Sound Classification in Noisy Environments

被引:0
作者
Chen, Xu [1 ]
Wang, Mei [1 ,2 ]
Kan, Ruixiang [3 ]
Qiu, Hongbing [3 ]
机构
[1] Guilin Univ Technol, Coll Comp Sci & Engn, Guilin 541006, Peoples R China
[2] Guilin Univ Technol, Coll Phys & Elect Informat Engn, Guilin 541006, Peoples R China
[3] Guilin Univ Elect Technol, Sch Informat & Commun, Guilin 541006, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 21期
基金
中国国家自然科学基金;
关键词
data augmentation; contrastive learning; feature fusion; deep learning; transformer; urban environmental sound recognition;
D O I
10.3390/app14219711
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
In urban environments, noise significantly impacts daily life and presents challenges for Environmental Sound Classification (ESC). The structural influence of urban noise on audio signals complicates feature extraction and audio classification for environmental sound classification methods. To address these challenges, this paper proposes a Contrastive Learning-based Audio Spectrogram Transformer (CL-Transformer) that incorporates a Patch-Mix mechanism and adaptive contrastive learning strategies while simultaneously improving and utilizing adaptive data augmentation techniques for model training. Firstly, a combination of data augmentation techniques is introduced to enrich environmental sounds. Then, the Patch-Mix feature fusion scheme randomly mixes patches of the enhanced and noisy spectrograms during the Transformer's patch embedding. Furthermore, a novel contrastive learning scheme is introduced to quantify loss and improve model performance, synergizing well with the Transformer model. Finally, experiments on the ESC-50 and UrbanSound8K public datasets achieved accuracies of 97.75% and 92.95%, respectively. To simulate the impact of noise in real urban environments, the model is evaluated using the UrbanSound8K dataset with added background noise at different signal-to-noise ratios (SNR). Experimental results demonstrate that the proposed framework performs well in noisy environments.
引用
收藏
页数:18
相关论文
共 37 条
[1]  
Akbari H, 2021, ADV NEUR IN
[2]   Rethinking environmental sound classification using convolutional neural networks: optimized parameter tuning of single feature extraction [J].
Al-Hattab, Yousef Abd ;
Zaki, Hasan Firdaus ;
Shafie, Amir Akramin .
NEURAL COMPUTING & APPLICATIONS, 2021, 33 (21) :14495-14506
[3]   Concatenation-based pre-trained convolutional neural networks using attention mechanism for environmental sound classification [J].
Ashurov, Asadulla ;
Yi, Zhou ;
Liu, Hongqing ;
Yu, Zhao ;
Li, Manhai .
APPLIED ACOUSTICS, 2024, 216
[4]   Environmental Sound Classification Based on Transfer-Learning Techniques with Multiple Optimizers [J].
Ashurov, Asadulla ;
Zhou, Yi ;
Shi, Liming ;
Zhao, Yu ;
Liu, Hongqing .
ELECTRONICS, 2022, 11 (15)
[5]  
Bae S, 2024, Arxiv, DOI arXiv:2305.14032
[6]   CNN-RNN and Data Augmentation Using Deep Convolutional Generative Adversarial Network for Environmental Sound Classification [J].
Bahmei, Behnaz ;
Birmingham, Elina ;
Arzanpour, Siamak .
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 :682-686
[7]   Environmental Sound Classification: A descriptive review of the literature [J].
Bansal, Anam ;
Garg, Naresh Kumar .
INTELLIGENT SYSTEMS WITH APPLICATIONS, 2022, 16
[8]   HTS-AT: A HIERARCHICAL TOKEN-SEMANTIC AUDIO TRANSFORMER FOR SOUND CLASSIFICATION AND DETECTION [J].
Chen, Ke ;
Du, Xingjian ;
Zhu, Bilei ;
Ma, Zejun ;
Berg-Kirkpatrick, Taylor ;
Dubnov, Shlomo .
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :646-650
[9]   Urban Sound Classification Using Convolutional Neural Network and Long Short Term Memory Based on Multiple Features [J].
Das, Joy Krishan ;
Ghosh, Arka ;
Pal, Abhijit Kumar ;
Dutta, Sumit ;
Chakrabarty, Amitabha .
2020 FOURTH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING IN DATA SCIENCES (ICDS), 2020,
[10]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848