Toward Video Anomaly Retrieval From Video Anomaly Detection: New Benchmarks and Model

被引：9

作者：

Wu, Peng ^{[1
]}

Liu, Jing ^{[2
]}

He, Xiangteng ^{[3
]}

Peng, Yuxin ^{[3
]}

Wang, Peng ^{[1
]}

Zhang, Yanning ^{[1
]}

机构：

[1] Northwestern Polytech Univ, Sch Comp Sci, Natl Engn Lab Integrated Aerosp Ground Ocean Big D, Xian 710060, Peoples R China

[2] Xidian Univ, Guangzhou Inst Technol, Guangzhou 510555, Peoples R China

[3] Peking Univ, Wangxuan Inst Comp Technol, Beijing 100871, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2024年 / 33卷

基金：

中国国家自然科学基金;

关键词：

Video anomaly retrieval; video anomaly detection; cross-modal retrieval;

D O I：

10.1109/TIP.2024.3374070

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video anomaly detection (VAD) has been paid increasing attention due to its potential applications, its current dominant tasks focus on online detecting anomalies, which can be roughly interpreted as the binary or multiple event classification. However, such a setup that builds relationships between complicated anomalous events and single labels, e.g., "vandalism", is superficial, since single labels are deficient to characterize anomalous events. In reality, users tend to search a specific video rather than a series of approximate videos. Therefore, retrieving anomalous events using detailed descriptions is practical and positive but few researches focus on this. In this context, we propose a novel task called Video Anomaly Retrieval (VAR), which aims to pragmatically retrieve relevant anomalous videos by cross-modalities, e.g., language descriptions and synchronous audios. Unlike the current video retrieval where videos are assumed to be temporally well-trimmed with short duration, VAR is devised to retrieve long untrimmed videos which may be partially relevant to the given query. To achieve this, we present two large-scale VAR benchmarks and design a model called Anomaly-Led Alignment Network (ALAN) for VAR. In ALAN, we propose an anomaly-led sampling to focus on key segments in long untrimmed videos. Then, we introduce an efficient pretext task to enhance semantic associations between video-text fine-grained representations. Besides, we leverage two complementary alignments to further match cross-modal contents. Experimental results on two benchmarks reveal the challenges of VAR task and also demonstrate the advantages of our tailored method. Captions are publicly released at https://github.com/Roc-Ng/VAR.

引用

页码：2213 / 2225

页数：13

共 50 条

[31] Multiple Instance Relational Learning for Video Anomaly Detection
Dengxiong, Xiwen
Bao, Wentao
Kong, Yu
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[32] A Survey of Single-Scene Video Anomaly Detection
Ramachandra, Bharathkumar
Jones, Michael J.
Vatsavai, Ranga Raju
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (05) : 2293 - 2312
[33] Video Anomaly Detection Based on Convolutional Recurrent AutoEncoder
Wang, Bokun
Yang, Caiqian
SENSORS, 2022, 22 (12)
[34] FOAD: a novel video anomaly detection focusing on objects
Li, Hongjun
Chen, Jinyi
Huang, Xiezhou
Zhang, Yuxing
Du, Yunlong
Chen, Junjie
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (07) : 20637 - 20651
[35] Leveraging Trajectory Prediction for Pedestrian Video Anomaly Detection
Kanu-Asiegbu, Asiegbu Miracle
Vasudevan, Ram
Du, Xiaoxiao
2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021), 2021,
[36] FOAD: a novel video anomaly detection focusing on objects
Hongjun Li
Jinyi Chen
Xiezhou Huang
Yuxing Zhang
Yunlong Du
Junjie Chen
Multimedia Tools and Applications, 2024, 83 : 20637 - 20651
[37] Context Recovery and Knowledge Retrieval: A Novel Two-Stream Framework for Video Anomaly Detection
Cao, Congqi
Lu, Yue
Zhang, Yanning
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 1810 - 1825
[38] Spatio-Temporal AutoEncoder for Video Anomaly Detection
Zhao, Yiru
Deng, Bing
Shen, Chen
Liu, Yao
Lu, Hongtao
Hua, Xian-Sheng
PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1933 - 1941
[39] A lightweight video anomaly detection model with weak supervision and adaptive instance selection
Wang, Yang
Zhou, Jiaogen
Guan, Jihong
NEUROCOMPUTING, 2025, 613
[40] Transformer Based Sptial-Temporal Extraction Model for Video Anomaly Detection
Wang, Zhiqiang
Gu, Xiaojing
Gu, Xingsheng
2024 8TH INTERNATIONAL CONFERENCE ON ROBOTICS, CONTROL AND AUTOMATION, ICRCA 2024, 2024, : 370 - 374

← 1 2 3 4 5 →