Dynamic Context Removal: A General Training Strategy for Robust Models on Video Action Predictive Tasks

被引：2

作者：

Xu, Xinyu ^{[1
]}

Li, Yong-Lu ^{[1
]}

Lu, Cewu ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Shanghai 200240, Peoples R China

来源：

INTERNATIONAL JOURNAL OF COMPUTER VISION | 2023年 / 131卷 / 12期

基金：

中国国家自然科学基金;

关键词：

Dynamic Context Removal; Video Action Anticipation; Early Action Recognition; Robustness;

D O I：

10.1007/s11263-023-01850-6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Predicting future actions is an essential feature of intelligent systems and embodied AI. However, compared to the traditional recognition tasks, the uncertainty of the future and the reasoning ability requirement make prediction tasks very challenging and far beyond solved. In this field, previous methods usually care more about the model architecture design but little attention has been put on how to train models with a proper learning policy. To this end, in this work, we propose a simple but effective training strategy, Dynamic Context Removal (DCR), which dynamically schedules the visibility of context in different training stages. It follows the human-like curriculum learning process, i.e., gradually removing the event context to increase the prediction difficulty till satisfying the final prediction target. Besides, we explore how to train robust models that give consistent predictions at different levels of observable context. Our learning scheme is plug-and-play and easy to integrate widely-used reasoning models including Transformer and LSTM, with advantages in both effectiveness and efficiency. We study two action prediction problems, i.e., Video Action Anticipation and Early Action Recognition. In extensive experiments, our method achieves state-of-the-art results on several widely-used benchmarks.

引用

页码：3272 / 3288

页数：17

共 64 条

[1]

Alvarez WM, 2020, IEEE INT VEH SYM, P39, DOI 10.1109/IV47402.2020.9304624

[2]

Arnab A., 2021, PREPRINT

[3]

Bengio Y, 2009, P 26 ANN INT C MACH, P41, DOI [10.1145/1553374.1553380, DOI 10.1145/1553374.1553380]

[4] Knowledge Distillation for Action Anticipation via Label Smoothing [J].

Camporese, Guglielmo ;

Coscia, Pasquale ;

Furnari, Antonino ;

Farinella, Giovanni Maria ;

Ballan, Lamberto .

2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, :3312-3319

[5] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].

Carreira, Joao ;

Zisserman, Andrew .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733

[6]

Cirik V., 2016, PREPRINT

[7]

DAMEN D, 2018, EUR C COMP VIS, P720

[8] Rescaling Egocentric Vision: Collection, Pipeline and Challenges for EPIC-KITCHENS-100 [J].

Damen, Dima ;

Doughty, Hazel ;

Farinella, Giovanni Maria ;

Furnari, Antonino ;

Kazakos, Evangelos ;

Ma, Jian ;

Moltisanti, Davide ;

Munro, Jonathan ;

Perrett, Toby ;

Price, Will ;

Wray, Michael .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (01) :33-55

[9] Modeling temporal structure with LSTM for online action detection [J].

De Geest, Roeland ;

Tuytelaars, Tinne .

2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, :1549-1557

[10] Forecasting Action Through Contact Representations From First Person Video [J].

Dessalene, Eadom ;

Devaraj, Chinmaya ;

Maynord, Michael ;

Fermueller, Cornelia ;

Aloimonos, Yiannis .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) :6703-6714

← 1 2 3 4 5 6 7 →