Self-supervised learning representation for abnormal acoustic event detection based on attentional contrastive learning

被引:1
作者
Wei, Juan [1 ]
Zhang, Qian [1 ]
Ning, Weichen [2 ]
机构
[1] Xidian Univ, Sch Commun Engn, Xian 710071, Peoples R China
[2] Hong Kong Polytech Univ, Fac Engn, Dept Comp, HongKong 100872, Peoples R China
关键词
Contrastive learning; Self -supervised learning; Attention mechanism; Abnormal acoustic event detection; FUSION;
D O I
10.1016/j.dsp.2023.104199
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Most abnormal acoustic event detection (AAED) is completed by supervised training of deep learning methods, but manually labeled samples are costly and scarce. This work proposes a self-supervised learning representation for AAED based on contrastive learning to overcome the abovementioned problem. Auditory and visual data augmentations are applied simultaneously to create positive sample pairs. An attention mechanism is introduced into the encoder during self-supervised pre-training. A comparison between fused features by discriminant correlation analysis and a single feature is made to verify the ability of feature grasping for the self-supervised pre-trained model. The pre-training is completed on an abnormal acoustic dataset with noise. Research results show that the self-supervised pre-trained model can achieve an accuracy of 87.72% in linear evaluation and 88.70% in the downstream task with a pure small AAED dataset, which directly exceeds the results of supervised learning. This work releases the stress of the demand for abnormal acoustic event labels.(c) 2023 Published by Elsevier Inc.
引用
收藏
页数:9
相关论文
共 41 条
[1]   A Large-Scale Benchmark Dataset for Anomaly Detection and Rare Event Classification for Audio Forensics [J].
Abbasi, Ahmed ;
Javed, Abdul Rehman Rehman ;
Yasin, Amanullah ;
Jalil, Zunera ;
Kryvinska, Natalia ;
Tariq, Usman .
IEEE ACCESS, 2022, 10 :38885-38894
[2]   Gunshot acoustic event identification and shooter localization in a WSN of asynchronous multichannel acoustic ground sensors [J].
Astapov, S. ;
Berdnikova, J. ;
Ehala, J. ;
Kaugerand, J. ;
Preden, J. -S. .
MULTIDIMENSIONAL SYSTEMS AND SIGNAL PROCESSING, 2018, 29 (02) :563-595
[3]  
Chen T, 2020, PR MACH LEARN RES, V119
[4]  
Chen Ting., 2020, Advances in neural information processing systems, V33, P22243
[5]  
Chen XL, 2020, Arxiv, DOI [arXiv:2003.04297, 10.48550/arXiv.2003.04297]
[6]  
Dufaux A., 2000, 2000 10 EUR SIGN PRO, P1
[7]   UNSUPERVISED CONTRASTIVE LEARNING OF SOUND EVENT REPRESENTATIONS [J].
Fonseca, Eduardo ;
Ortego, Diego ;
McGuinness, Kevin ;
O'Connor, Noel E. ;
Serra, Xavier .
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :371-375
[8]  
Gong Y, 2021, Arxiv, DOI [arXiv:2104.01778, DOI 10.48550/ARXIV.2104.01778, 10.48550/arXiv.2104.01778]
[9]   Beyond Self-Attention: External Attention Using Two Linear Layers for Visual Tasks [J].
Guo, Meng-Hao ;
Liu, Zheng-Ning ;
Mu, Tai-Jiang ;
Hu, Shi-Min .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (05) :5436-5447
[10]   Discriminant Correlation Analysis: Real-Time Feature Level Fusion for Multimodal Biometric Recognition [J].
Haghighat, Mohammad ;
Abdel-Mottaleb, Mohamed ;
Alhalabi, Wadee .
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2016, 11 (09) :1984-1996