Multimodal Evidential Learning for Open-World Weakly-Supervised Video Anomaly Detection

被引:1
作者
Huang, Chao [1 ]
Huang, Weiliang [2 ]
Jiang, Qiuping [3 ]
Wang, Wei [1 ]
Wen, Jie [4 ]
Zhang, Bob [2 ]
机构
[1] Shenzhen Campus Sun Yat sen Univ, Sch Cyber Sci & Technol, Shenzhen 518000, Peoples R China
[2] Univ Macau, Dept Comp & Informat Sci, PAMI Res Grp, Macau 999078, Peoples R China
[3] Ningbo Univ, Fac Informat Sci & Engn, Ningbo 315211, Peoples R China
[4] Harbin Inst Technol, Shenzhen Key Lab Visual Object Detect & Recognit, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
Visualization; Anomaly detection; Training; Uncertainty; Feature extraction; Annotations; Deep learning; Correlation; Collaboration; Probabilistic logic; Video anomaly detection; vision-language model; evidential learning; UNDERWATER IMAGE-ENHANCEMENT;
D O I
10.1109/TMM.2025.3557682
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Efforts in weakly-supervised video anomaly detection center on detecting abnormal events within videos by coarse-grained labels, which has been successfully applied to many real-world applications. However, a significant limitation of most existing methods is that they are only effective for specific objects in specific scenarios, which makes them prone to misclassification or omission when confronted with previously unseen anomalies. Relative to conventional anomaly detection tasks, Open-world Weakly-supervised Video Anomaly Detection (OWVAD) poses greater challenges due to the absence of labels and fine-grained annotations for unknown anomalies. To address the above problem, we propose a multi-scale evidential vision-language model to achieve open-world video anomaly detection. Specifically, we leverage generalized visual-language associations derived from CLIP to harness the full potential of large pre-trained models in addressing the OWVAD task. Subsequently, we integrate a multi-scale temporal modeling module with a multimodal evidence collector to achieve precise frame-level detection of both seen and unseen anomalies. Extensive experiments on two widely-utilized benchmarks have conclusively validated the effectiveness of our method. The code will be made publicly available.
引用
收藏
页码:3132 / 3143
页数:12
相关论文
共 59 条
[1]  
Baradaran Mohammad, 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), P2886, DOI 10.1109/CVPRW59228.2023.00290
[2]   Pixel-Level Anomaly Detection via Uncertainty-Aware Prototypical Transformer [J].
Huang, Chao ;
Liu, Chengliang ;
Zhang, Zheng ;
Wu, Zhihao ;
Wen, Jie ;
Jiang, Qiuping ;
Xu, Yong .
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022,
[3]   Prompt-Enhanced Multiple Instance Learning for Weakly Supervised Video Anomaly Detection [J].
Chen, Junxi ;
Li, Liang ;
Su, Li ;
Zha, Zheng-Jun ;
Huang, Qingming .
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, :18319-18329
[4]   Uncertainty-Aware Dual-Evidential Learning for Weakly-Supervised Temporal Action Localization [J].
Chen, Mengyuan ;
Gao, Junyu ;
Xu, Changsheng .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (12) :15896-15911
[5]   Cascade Evidential Learning for Open-world Weakly-supervised Temporal Action Localization [J].
Chen, Mengyuan ;
Gao, Junyu ;
Xu, Changsheng .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :14741-14750
[6]  
Chen YX, 2023, AAAI CONF ARTIF INTE, P387
[7]   Look Around for Anomalies: Weakly-supervised Anomaly Detection via Context-Motion Relational Learning [J].
Cho, MyeongAh ;
Kim, Minjung ;
Hwang, Sangwon ;
Park, Chaewon ;
Lee, Kyungjae ;
Lee, Sangyoun .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :12137-12146
[8]   Discriminative Action Snippet Propagation Network for Weakly Supervised Temporal Action Localization [J].
Dang, Yuanjie ;
Huang, Chunxia ;
Chen, Peng ;
Zhao, Dongdong ;
Gao, Nan ;
Liang, Ronghua ;
Huan, Ruohong .
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (06)
[9]  
Dubey A., 2024, arXiv, DOI DOI 10.48550/ARXIV.2407.21783
[10]   Weakly-Supervised Video Anomaly Detection With Snippet Anomalous Attention [J].
Fan, Yidan ;
Yu, Yongxin ;
Lu, Wenhuan ;
Han, Yahong .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) :5480-5492