Dynamic Erasing Network With Adaptive Temporal Modeling for Weakly Supervised Video Anomaly Detection

被引:0
作者
Zhang, Chen [1 ,2 ]
Li, Guorong [3 ]
Qi, Yuankai [4 ]
Ye, Hanhua [3 ]
Qing, Laiyun [3 ]
Yang, Ming-Hsuan [5 ,6 ,7 ]
Huang, Qingming [3 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing 100085, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing 100049, Peoples R China
[3] Univ Chinese Acad Sci, Sch Comp Sci & Technol, Key Lab Big Data Min & Knowledge Management, Beijing 100049, Peoples R China
[4] Macquarie Univ, Sch Comp, Sydney, NSW 2109, Australia
[5] Univ Calif Merced, Dept Elect Engn & Comp Sci, Merced, CA 95343 USA
[6] Yonsei Univ, Coll Comp, Seoul 03722, South Korea
[7] Google, Mountain View, CA 94043 USA
基金
中国国家自然科学基金;
关键词
Anomaly detection; Adaptation models; Feature extraction; Training; Weak supervision; Predictive models; Context modeling; Annotations; Adaptive systems; Visualization; Adaptive temporal modeling (ATM); dynamic erasing (DE); video anomaly detection; weak supervision; PREDICTION;
D O I
10.1109/TNNLS.2025.3553556
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The weakly supervised video anomaly detection aims to learn a detection model using only video-level labeled data. Prior studies ignore the complexity or duration of anomalies present in abnormal videos during temporal modeling. Moreover, existing works usually detect the most abnormal segments, potentially overlooking the completeness of anomalies. We propose a dynamic erasing network (DE-Net) for weakly supervised video anomaly detection, which learns video-specific temporal features via adaptive temporal modeling (ATM) to address these limitations. Specifically, to handle duration variations of abnormal events, we propose an ATM module capable of adaptively selecting and aggregating the most appropriate K temporal scale features for each video. Then, we design a dynamic erasing (DE) strategy that dynamically assesses the completeness of the detected anomalies and erases prominent abnormal segments to encourage the model to discover gentle abnormal segments. The proposed method achieves favorable performance compared to several state-of-the-art approaches on the widely used XD-Violence, TAD, and UCF-Crime datasets.
引用
收藏
页数:15
相关论文
共 89 条
[1]  
Cai RC, 2021, AAAI CONF ARTIF INTE, V35, P938
[2]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[3]   Contrastive Attention for Video Anomaly Detection [J].
Chang, Shuning ;
Li, Yanchao ;
Shen, Shengmei ;
Feng, Jiashi ;
Zhou, Zhiying .
IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 :4067-4076
[4]  
Chen Weiling, 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), P5549, DOI 10.1109/CVPRW59228.2023.00587
[5]  
Chen YX, 2023, AAAI CONF ARTIF INTE, P387
[6]   Abnormal Event Detection in Videos Using Spatiotemporal Autoencoder [J].
Chong, Yong Shean ;
Tay, Yong Haur .
ADVANCES IN NEURAL NETWORKS, PT II, 2017, 10262 :189-196
[7]   Sparse Reconstruction Cost for Abnormal Event Detection [J].
Cong, Yang ;
Yuan, Junsong ;
Liu, Ji .
2011 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2011, :1807-+
[8]   MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection [J].
Dai, Rui ;
Das, Srijan ;
Kahatapitiya, Kumara ;
Ryoo, Michael S. ;
Bremond, Francois .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :20009-20019
[9]  
Du N, 2022, PR MACH LEARN RES
[10]   Weakly-Supervised Video Anomaly Detection With Snippet Anomalous Attention [J].
Fan, Yidan ;
Yu, Yongxin ;
Lu, Wenhuan ;
Han, Yahong .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) :5480-5492