Self-supervised learning representation for abnormal acoustic event detection based on attentional contrastive learning

被引:1
作者
Wei, Juan [1 ]
Zhang, Qian [1 ]
Ning, Weichen [2 ]
机构
[1] Xidian Univ, Sch Commun Engn, Xian 710071, Peoples R China
[2] Hong Kong Polytech Univ, Fac Engn, Dept Comp, HongKong 100872, Peoples R China
关键词
Contrastive learning; Self -supervised learning; Attention mechanism; Abnormal acoustic event detection; FUSION;
D O I
10.1016/j.dsp.2023.104199
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Most abnormal acoustic event detection (AAED) is completed by supervised training of deep learning methods, but manually labeled samples are costly and scarce. This work proposes a self-supervised learning representation for AAED based on contrastive learning to overcome the abovementioned problem. Auditory and visual data augmentations are applied simultaneously to create positive sample pairs. An attention mechanism is introduced into the encoder during self-supervised pre-training. A comparison between fused features by discriminant correlation analysis and a single feature is made to verify the ability of feature grasping for the self-supervised pre-trained model. The pre-training is completed on an abnormal acoustic dataset with noise. Research results show that the self-supervised pre-trained model can achieve an accuracy of 87.72% in linear evaluation and 88.70% in the downstream task with a pure small AAED dataset, which directly exceeds the results of supervised learning. This work releases the stress of the demand for abnormal acoustic event labels.(c) 2023 Published by Elsevier Inc.
引用
收藏
页数:9
相关论文
共 41 条
[21]   Accurate Single Stage Detector Using Recurrent Rolling Convolution [J].
Ren, Jimmy ;
Chen, Xiaohao ;
Liu, Jianbo ;
Sun, Wenxiu ;
Pang, Jiahao ;
Yan, Qiong ;
Tai, Yu-Wing ;
Xu, Li .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :752-760
[22]  
Ruder S, 2017, Arxiv, DOI arXiv:1609.04747
[23]  
Park DS, 2019, Arxiv, DOI [arXiv:1904.08779, DOI 10.48550/ARXIV.1904.08779, 10.21437/Interspeech.2019-2680]
[24]   CONTRASTIVE LEARNING OF GENERAL-PURPOSE AUDIO REPRESENTATIONS [J].
Saeed, Aaqib ;
Grangier, David ;
Zeghidour, Neil .
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :3875-3879
[25]  
Shen ZQ, 2022, AAAI CONF ARTIF INTE, P2216
[26]  
Simonyan K, 2015, Arxiv, DOI arXiv:1409.1556
[27]   A new method of feature fusion and its application in image recognition [J].
Sun, QS ;
Zeng, SG ;
Liu, Y ;
Heng, PA ;
Xia, DS .
PATTERN RECOGNITION, 2005, 38 (12) :2437-2448
[28]   Voice activity detection based on the improved dual-threshold method [J].
Sun Yiming ;
Rui, Wang .
2015 INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION, BIG DATA AND SMART CITY (ICITBS), 2016, :996-999
[29]  
Tay Y, 2022, Arxiv, DOI [arXiv:2105.03322, 10.48550/arXiv.2105.03322, DOI 10.48550/ARXIV.2105.03322]
[30]   Classification of acoustic events using SVM-based clustering schemes [J].
Temko, A ;
Nadeu, C .
PATTERN RECOGNITION, 2006, 39 (04) :682-694