Human-Scene Network: A novel baseline with self-rectifying loss for weakly supervised video anomaly detection

被引：3

作者：

Majhi, Snehashis ^{[1
]}

Dai, Rui ^{[1
]}

Kong, Quan ^{[2
]}

Garattoni, Lorenzo ^{[3
]}

Francesca, Gianpiero ^{[3
]}

Bremond, Francois ^{[1
]}

机构：

[1] INRIA, 2004 Rte Lucioles, Valbonne, France

[2] Woven Planet Holdings, 3-2-1 Nihonbashimuromachi,Chuo Ku, Tokyo, Japan

[3] Toyota Motor Europe, 60 Av Bourget, Brussels, Belgium

来源：

COMPUTER VISION AND IMAGE UNDERSTANDING | 2024年 / 241卷

关键词：

Video anomaly detection; Weakly-supervised learning; ABNORMAL EVENT DETECTION;

D O I：

10.1016/j.cviu.2024.103955

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video anomaly detection in surveillance systems with only video -level labels (i.e. weakly supervised) is challenging. This is due to (i) the complex integration of a large variety of scenarios including human and scene -based anomalies characterized by subtle or sharp spatio-temporal cues in real -world videos and (ii) non -optimal optimization between normal and anomaly instances under weak supervision. In this paper, we propose a Human -Scene Network to learn discriminative representations by capturing both subtle and strong cues in a dissociative manner. In addition, a self -rectifying loss is proposed that dynamically computes the pseudo -temporal annotations from video -level labels for optimizing the Human -Scene Network effectively. The proposed Human -Scene Network optimized with self -rectifying loss is validated on three publicly available datasets i.e. UCF-Crime, ShanghaiTech, and IITB-Corridor, outperforming recently reported state-of-the-art approaches on five out of the six scenarios considered.

引用

页数：11

共 50 条

[1] Robust real-time unusual event detection using multiple fixed-location monitors [J].

Adam, Amit ;

Rivlin, Ehud ;

Shimshoni, Ilan ;

Reinitz, David .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2008, 30 (03) :555-560

[2] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].

Carreira, Joao ;

Zisserman, Andrew .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733

[3]

Chen Weiling, 2023, P IEEECVF C COMPUTER, P5548

[4]

Chen YX, 2023, AAAI CONF ARTIF INTE, P387

[5] Learning a similarity metric discriminatively, with application to face verification [J].

Chopra, S ;

Hadsell, R ;

LeCun, Y .

2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :539-546

[6] Abnormal event detection in crowded scenes using sparse representation [J].

Cong, Yang ;

Yuan, Junsong ;

Liu, Ji .

PATTERN RECOGNITION, 2013, 46 (07) :1851-1864

[7] Learning Spatiotemporal Features with 3D Convolutional Networks [J].

Du Tran ;

Bourdev, Lubomir ;

Fergus, Rob ;

Torresani, Lorenzo ;

Paluri, Manohar .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497

[8]

Fan Y., 2024, IEEE Transactions on Circuits and Systems for Video Technology

[9] MIST: Multiple Instance Self-Training Framework for Video Anomaly Detection [J].

Feng, Jia-Chang ;

Hong, Fa-Ting ;

Zheng, Wei-Shi .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :14004-14013

[10] Learning Temporal Regularity in Video Sequences [J].

Hasan, Mahmudul ;

Choi, Jonghyun ;

Neumann, Jan ;

Roy-Chowdhury, Amit K. ;

Davis, Larry S. .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :733-742

← 1 2 3 4 5 →