Temporal Dropout for Weakly Supervised Action Localization

被引:8
作者
Xie, Chi [1 ]
Zhuang, Zikun [1 ]
Zhao, Shengjie [1 ]
Liang, Shuang [1 ]
机构
[1] Tongji Univ, 4800 Caoan Rd, Shanghai, Peoples R China
基金
上海市自然科学基金; 中国国家自然科学基金;
关键词
Weakly supervised; temporal action localization; adversarial erasing; adaptive dropout; NETWORK;
D O I
10.1145/3567827
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Weakly supervised action localization is a challenging problem in video understanding and action recognition. Existing models usually formulate the training process as direct classification using video-level supervision. They tend to only locate the most discriminative parts of action instances and produce temporally incomplete detection results. A natural solution for this problem, the adversarial erasing strategy, is to remove such parts from training so that models can attend to complementary parts. Previous works do it in an offline and heuristic way. They adopt a multi-stage pipeline, where discriminative regions are determined and erased under the guidance of detection results from last stage. Such a pipeline can be both ineffective and inefficient, possibly hindering the overall performance. On the contrary, we combine adversarial erasing with dropout mechanism and propose a Temporal Dropout Module that learns where to remove in a data-driven and online manner. This plug-and-play module is trained without iterative stages, which not only simplifies the pipeline but also makes the regularization during training easier and more adaptive. Experiments show that the proposed method outperforms previous erasing-based methods by a large margin. More importantly, it achieves universal improvement when plugged into various direct classification methods and obtains state-of-the-art performance.
引用
收藏
页数:24
相关论文
共 62 条
[1]   Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos [J].
Arnab, Anurag ;
Sun, Chen ;
Nagrani, Arsha ;
Schmid, Cordelia .
COMPUTER VISION - ECCV 2020, PT X, 2020, 12355 :751-768
[2]  
Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698
[3]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[4]   Rethinking the Faster R-CNN Architecture for Temporal Action Localization [J].
Chao, Yu-Wei ;
Vijayanarasimhan, Sudheendra ;
Seybold, Bryan ;
Ross, David A. ;
Deng, Jia ;
Sukthankar, Rahul .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1130-1139
[5]   Relation Attention for Temporal Action Localization [J].
Chen, Peihao ;
Gan, Chuang ;
Shen, Guangyao ;
Huang, Wenbing ;
Zeng, Runhao ;
Tan, Mingkui .
IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (10) :2723-2733
[6]  
Diederik K., 2015, ICLR
[7]  
Ghiasi G, 2018, ADV NEUR IN, V31
[8]  
Haisheng S., 2018, P AS C COMP VIS, P558
[9]   Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization [J].
Huang, Linjiang ;
Wang, Liang ;
Li, Hongsheng .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :7982-7991
[10]  
Huang LJ, 2020, AAAI CONF ARTIF INTE, V34, P11053