CLIP-TSA: CLIP-ASSISTED TEMPORAL SELF-ATTENTION FOR WEAKLY-SUPERVISED VIDEO ANOMALY DETECTION

被引:27
作者
Joo, Hyekang Kevin [1 ]
Khoa Vo [2 ]
Yamazaki, Kashu [2 ]
Ngan Le [2 ]
机构
[1] Univ Maryland, Dept Comp Sci, College Pk, MD 20742 USA
[2] Univ Arkansas, Dept Comp Sci & Comp Engn, Fayetteville, AR USA
来源
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2023年
基金
美国国家科学基金会;
关键词
video anomaly detection; temporal self-attention; weakly supervised; multimodal model; subtlety;
D O I
10.1109/ICIP49359.2023.10222289
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video anomaly detection (VAD) - commonly formulated as a multiple-instance learning problem in a weakly-supervised manner due to its labor-intensive nature - is a challenging problem in video surveillance where the frames of anomaly need to be localized in an untrimmed video. In this paper, we first propose to utilize the ViT-encoded visual features from CLIP, in contrast with the conventional C3D or I3D features in the domain, to efficiently extract discriminative representations in the novel technique. We then model temporal dependencies and nominate the snippets of interest by leveraging our proposed Temporal Self-Attention (TSA). The ablation study confirms the effectiveness of TSA and ViT feature. The extensive experiments show that our proposed CLIP-TSA outperforms the existing state-of-the-art (SOTA) methods by a large margin on three commonly-used benchmark datasets in the VAD problem (UCF-Crime, ShanghaiTech Campus and XD-Violence). Our source code is available at https://github.com/joos2010kj/CLIP-TSA.
引用
收藏
页码:3230 / 3234
页数:5
相关论文
共 51 条
[1]  
[Anonymous], 2022, CVPR, DOI DOI 10.1109/CVPR52688.2022.00321
[2]  
[Anonymous], 2018, CVPR, DOI DOI 10.1109/CVPR.2018.00293
[3]  
[Anonymous], 2023, AAAI CONF ARTIF INTE
[4]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[5]   Differentiable Patch Selection for Image Recognition [J].
Cordonnier, Jean-Baptiste ;
Mahendran, Aravindh ;
Dosovitskiy, Alexey ;
Weissenborn, Dirk ;
Uszkoreit, Jakob ;
Unterthiner, Thomas .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :2351-2360
[6]  
Dosovitskiy A., 2021, INT C LEARNING REPRE, P1
[7]   Memorizing Normality to Detect Anomaly: Memory-augmented Deep Autoencoder for Unsupervised Anomaly Detection [J].
Gong, Dong ;
Liu, Lingqiao ;
Le, Vuong ;
Saha, Budhaditya ;
Mansour, Moussa Reda ;
Venkatesh, Svetha ;
van den Hengel, Anton .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :1705-1714
[8]   Learning Temporal Regularity in Video Sequences [J].
Hasan, Mahmudul ;
Choi, Jonghyun ;
Neumann, Jan ;
Roy-Chowdhury, Amit K. ;
Davis, Larry S. .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :733-742
[9]  
He Bo, 2022, CVPR, P13925
[10]   Dentinogenesis Imperfecta and Caries in Osteogenesis Imperfecta among Vietnamese Children [J].
Huong Thi Thu Nguyen ;
Dung Chi Vu ;
Duc Minh Nguyen ;
Quang Dinh Dang ;
Van Khanh Tran ;
Le, Hung ;
Son Minh Tong .
DENTISTRY JOURNAL, 2021, 9 (05)