CLIP-TSA: CLIP-ASSISTED TEMPORAL SELF-ATTENTION FOR WEAKLY-SUPERVISED VIDEO ANOMALY DETECTION

被引:8
作者
Joo, Hyekang Kevin [1 ]
Khoa Vo [2 ]
Yamazaki, Kashu [2 ]
Ngan Le [2 ]
机构
[1] Univ Maryland, Dept Comp Sci, College Pk, MD 20742 USA
[2] Univ Arkansas, Dept Comp Sci & Comp Engn, Fayetteville, AR USA
来源
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2023年
基金
美国国家科学基金会;
关键词
video anomaly detection; temporal self-attention; weakly supervised; multimodal model; subtlety;
D O I
10.1109/ICIP49359.2023.10222289
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video anomaly detection (VAD) - commonly formulated as a multiple-instance learning problem in a weakly-supervised manner due to its labor-intensive nature - is a challenging problem in video surveillance where the frames of anomaly need to be localized in an untrimmed video. In this paper, we first propose to utilize the ViT-encoded visual features from CLIP, in contrast with the conventional C3D or I3D features in the domain, to efficiently extract discriminative representations in the novel technique. We then model temporal dependencies and nominate the snippets of interest by leveraging our proposed Temporal Self-Attention (TSA). The ablation study confirms the effectiveness of TSA and ViT feature. The extensive experiments show that our proposed CLIP-TSA outperforms the existing state-of-the-art (SOTA) methods by a large margin on three commonly-used benchmark datasets in the VAD problem (UCF-Crime, ShanghaiTech Campus and XD-Violence). Our source code is available at https://github.com/joos2010kj/CLIP-TSA.
引用
收藏
页码:3230 / 3234
页数:5
相关论文
共 51 条
  • [1] [Anonymous], 2022, CVPR, DOI DOI 10.1109/CVPR52688.2022.00321
  • [2] [Anonymous], 2018, CVPR, DOI DOI 10.1109/CVPR.2018.00293
  • [3] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
    Carreira, Joao
    Zisserman, Andrew
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
  • [4] Differentiable Patch Selection for Image Recognition
    Cordonnier, Jean-Baptiste
    Mahendran, Aravindh
    Dosovitskiy, Alexey
    Weissenborn, Dirk
    Uszkoreit, Jakob
    Unterthiner, Thomas
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 2351 - 2360
  • [5] Dosovitskiy A., 2021, P ICLR
  • [6] Memorizing Normality to Detect Anomaly: Memory-augmented Deep Autoencoder for Unsupervised Anomaly Detection
    Gong, Dong
    Liu, Lingqiao
    Le, Vuong
    Saha, Budhaditya
    Mansour, Moussa Reda
    Venkatesh, Svetha
    van den Hengel, Anton
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 1705 - 1714
  • [7] Learning Temporal Regularity in Video Sequences
    Hasan, Mahmudul
    Choi, Jonghyun
    Neumann, Jan
    Roy-Chowdhury, Amit K.
    Davis, Larry S.
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 733 - 742
  • [8] He Bo, 2022, CVPR, P13925
  • [9] Dentinogenesis Imperfecta and Caries in Osteogenesis Imperfecta among Vietnamese Children
    Huong Thi Thu Nguyen
    Dung Chi Vu
    Duc Minh Nguyen
    Quang Dinh Dang
    Van Khanh Tran
    Le, Hung
    Son Minh Tong
    [J]. DENTISTRY JOURNAL, 2021, 9 (05)
  • [10] Hutchinson M.S., 2021, IEEE ACCESS