Long-Short Temporal Co-Teaching for Weakly Supervised Video Anomaly Detection

被引：10

作者：

Sun, Shengyang ^{[1
]}

Gong, Xiaojin ^{[1
]}

机构：

[1] Zhejiang Univ, Coll Informat Sci & Elect Engn, Hangzhou, Zhejiang, Peoples R China

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME | 2023年

关键词：

Video anomaly detection; weak supervision; co-teaching strategy;

D O I：

10.1109/ICME55011.2023.00461

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Weakly supervised video anomaly detection (WS-VAD) is a challenging problem that aims to learn VAD models only with video-level annotations. In this work, we propose a Long-Short Temporal Co-teaching (LSTC) method to address the WS-VAD problem. It constructs two tubelet-based spatiotemporal transformer networks to learn from short- and long-term video clips respectively. Each network is trained with respect to a multiple instance learning (MIL)-based ranking loss, together with a cross-entropy loss when clip-level pseudo labels are available. A co-teaching strategy is adopted to train the two networks. That is, clip-level pseudo labels generated from each network are used to supervise the other one at the next training round, and the two networks are learned alternatively and iteratively. Our proposed method is able to better deal with the anomalies with varying durations as well as subtle anomalies. Extensive experiments on three public datasets demonstrate that our method outperforms state-of-the-art WS-VAD methods. Code is available at https://github.com/shengyangsun/LSTC VAD.

引用

页码：2711 / 2716

页数：6

共 27 条

[1]

Abnar S, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P4190

[2] UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection [J].

Acsintoae, Andra ;

Florescu, Andrei ;

Georgescu, Mariana-Iuliana ;

Mare, Tudor ;

Sumedrea, Paul ;

Ionescu, Radu Tudor ;

Khan, Fahad Shahbaz ;

Shah, Mubarak .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :20111-20121

[3] ViViT: A Video Vision Transformer [J].

Arnab, Anurag ;

Dehghani, Mostafa ;

Heigold, Georg ;

Sun, Chen ;

Lucic, Mario ;

Schmid, Cordelia .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :6816-6826

[4]

Bertasius G, 2021, PR MACH LEARN RES, V139

[5]

Cai RC, 2021, AAAI CONF ARTIF INTE, V35, P938

[6] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].

Carreira, Joao ;

Zisserman, Andrew .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733

[7] Learning Spatiotemporal Features with 3D Convolutional Networks [J].

Du Tran ;

Bourdev, Lubomir ;

Fergus, Rob ;

Torresani, Lorenzo ;

Paluri, Manohar .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497

[8] MIST: Multiple Instance Self-Training Framework for Video Anomaly Detection [J].

Feng, Jia-Chang ;

Hong, Fa-Ting ;

Zheng, Wei-Shi .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :14004-14013

[9]

Ge Y., 2020, INT C LEARN REPR ICL

[10] Co-teaching: Robust Training of Deep Neural Networks with Extremely Noisy Labels [J].

Han, Bo ;

Yao, Quanming ;

Yu, Xingrui ;

Niu, Gang ;

Xu, Miao ;

Hu, Weihua ;

Tsang, Ivor W. ;

Sugiyama, Masashi .

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31

← 1 2 3 →