RefineLoc: Iterative Refinement for Weakly-Supervised Action Localization

被引:33
作者
Pardo, Alejandro [1 ]
Alwassel, Humam [1 ]
Heilbron, Fabian Caba [2 ]
Thabet, Ali [1 ]
Ghanem, Bernard [1 ]
机构
[1] King Abdullah Univ Sci & Technol KAUST, Thuwal, Saudi Arabia
[2] Adobe Res, San Francisco, CA USA
来源
2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021 | 2021年
关键词
D O I
10.1109/WACV48630.2021.00336
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video action detectors are usually trained using datasets with fully-supervised temporal annotations. Building such datasets is an expensive task. To alleviate this problem, recent methods have tried to leverage weak labeling, where videos are untrimmed and only a video-level label is available. In this paper, we propose RefineLoc, a novel weakly-supervised temporal action localization method. RefineLoc uses an iterative refinement approach by estimating and training on snippet-level pseudo ground truth at every iteration. We show the benefit of this iterative approach and present an extensive analysis of five different pseudo ground truth generators. We show the effectiveness of our model on two standard action datasets, ActivityNet v1.2 and THUMOS14. RefineLoc shows competitive results with the state-of-the-art in weakly-supervised temporal localization. Additionally, our iterative refinement process is able to significantly improve the performance of two state-of-the-art methods, setting a new state-of-the-art on THUMOS14.
引用
收藏
页码:3318 / 3327
页数:10
相关论文
共 70 条
[1]   MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation [J].
Abu Farha, Yazan ;
Gall, Juergen .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3570-3579
[2]  
Alwassel H., 2018, ECCV, P251
[3]  
[Anonymous], 2017, Deep learning is robust to massive label noise
[4]  
[Anonymous], 2018, ARXIV180803766
[5]  
[Anonymous], 2016, LECT NOTES COMP VIII
[6]   Weakly Supervised Deep Detection Networks [J].
Bilen, Hakan ;
Vedaldi, Andrea .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2846-2854
[7]  
Bojanowski P, 2014, LECT NOTES COMPUT SC, V8693, P628, DOI 10.1007/978-3-319-10602-1_41
[8]   SST: Single-Stream Temporal Action Proposals [J].
Buch, Shyamal ;
Escorcia, Victor ;
Shen, Chuanqi ;
Ghanem, Bernard ;
Niebles, Juan Carlos .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6373-6382
[9]  
Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698
[10]   Unsupervised Pre-Training of Image Features on Non-Curated Data [J].
Caron, Mathilde ;
Bojanowski, Piotr ;
Mairal, Julien ;
Joulin, Armand .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :2959-2968