CLIP-TSA: CLIP-ASSISTED TEMPORAL SELF-ATTENTION FOR WEAKLY-SUPERVISED VIDEO ANOMALY DETECTION

被引:8
作者
Joo, Hyekang Kevin [1 ]
Khoa Vo [2 ]
Yamazaki, Kashu [2 ]
Ngan Le [2 ]
机构
[1] Univ Maryland, Dept Comp Sci, College Pk, MD 20742 USA
[2] Univ Arkansas, Dept Comp Sci & Comp Engn, Fayetteville, AR USA
来源
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2023年
基金
美国国家科学基金会;
关键词
video anomaly detection; temporal self-attention; weakly supervised; multimodal model; subtlety;
D O I
10.1109/ICIP49359.2023.10222289
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video anomaly detection (VAD) - commonly formulated as a multiple-instance learning problem in a weakly-supervised manner due to its labor-intensive nature - is a challenging problem in video surveillance where the frames of anomaly need to be localized in an untrimmed video. In this paper, we first propose to utilize the ViT-encoded visual features from CLIP, in contrast with the conventional C3D or I3D features in the domain, to efficiently extract discriminative representations in the novel technique. We then model temporal dependencies and nominate the snippets of interest by leveraging our proposed Temporal Self-Attention (TSA). The ablation study confirms the effectiveness of TSA and ViT feature. The extensive experiments show that our proposed CLIP-TSA outperforms the existing state-of-the-art (SOTA) methods by a large margin on three commonly-used benchmark datasets in the VAD problem (UCF-Crime, ShanghaiTech Campus and XD-Violence). Our source code is available at https://github.com/joos2010kj/CLIP-TSA.
引用
收藏
页码:3230 / 3234
页数:5
相关论文
共 51 条
  • [31] ABN: Agent-Aware Boundary Networks for Temporal Action Proposal Generation
    Vo, Khoa
    Yamazaki, Kashu
    Truong, Sang
    Tran, Minh-Triet
    Sugimoto, Akihiro
    Le, Ngan
    [J]. IEEE ACCESS, 2021, 9 : 126431 - 126445
  • [32] Vo Khoa, 2021, BMVC
  • [33] (2+1)D Distilled ShuffleNet: A Lightweight Unsupervised Distillation Network for Human Action Recognition
    Vu, Duc-Quang
    Le, Ngan T. H.
    Wang, Jia-Ching
    [J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 3197 - 3203
  • [34] WEAKLY SUPERVISED VIDEO ANOMALY DETECTION VIA CENTER-GUIDED DISCRIMINATIVE LEARNING
    Wan, Boyang
    Fang, Yuming
    Xia, Xue
    Mei, Jiajie
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
  • [35] GODS: Generalized One-class Discriminative Subspaces for Anomaly Detection
    Wang, Jue
    Cherian, Anoop
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 8200 - 8210
  • [36] Event-Centric Hierarchical Representation for Dense Video Captioning
    Wang, Teng
    Zheng, Huicheng
    Yu, Mingjing
    Tian, Qian
    Hu, Haifeng
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (05) : 1890 - 1900
  • [37] Wang XP, 2018, IDEAS HIST MOD CHINA, V19, P1, DOI 10.1163/9789004385580_002
  • [38] Wu P., 2020, COMPUTER VISION ECCV, P322
  • [39] Learning Causal Temporal Relation and Feature Discrimination for Anomaly Detection
    Wu, Peng
    Liu, Jing
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 3513 - 3527
  • [40] Wu Z., 2021, ICCV