Background-Aware Robust Context Learning for Weakly-Supervised Temporal Action Localization

被引:2
作者
Kim, Jinah [1 ]
Cho, Jungchan [1 ]
机构
[1] Gachon Univ, Coll Informat Technol, Seongnam Si 13120, Gyeonggi Do, South Korea
基金
新加坡国家研究基金会;
关键词
Entropy; Feature extraction; Location awareness; Context modeling; Annotations; Training; Reliability; Temporal action localization; entropy maximization; context learning; feature adaptation;
D O I
10.1109/ACCESS.2022.3183789
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Weakly supervised temporal action localization (WTAL) aims to localize temporal intervals of actions in an untrimmed video using only video-level action labels. Although the learning of the background is an important issue in WTAL, most previous studies have not utilized an effective background. In this study, we propose a novel method for robustly separating contexts, e.g., action-like background, from the foreground to more accurately localize the action intervals. First, we detect background segments based on their probabilities to minimize the impact of background estimation errors. Second, we define the entropy boundary of the foreground and the positive distance between the boundary and background entropy. The background probability and entropy boundary allow the segment-level classifier to robustly learn the background. Third, we improve the performance of the overall actionness model based on a consensus of the RGB and flow features. The results of extensive experiments demonstrate that the proposed method learns the context separately from the action, consequently achieving new state-of-the-art results on the THUMOS-14 and ActivityNet-1.2 benchmarks. We also confirm that using feature adaptation helps overcome the limitation of a pretrained feature extractor on datasets that contain many backgrounds, such as THUMOS-14.
引用
收藏
页码:65315 / 65325
页数:11
相关论文
共 31 条
[1]  
Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698
[2]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[3]   Entropy Maximization and Meta Classification for Out-of-Distribution Detection in Semantic Segmentation [J].
Chan, Robin ;
Rottmann, Matthias ;
Gottschalk, Hanno .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :5108-5117
[4]   Rethinking the Faster R-CNN Architecture for Temporal Action Localization [J].
Chao, Yu-Wei ;
Vijayanarasimhan, Sudheendra ;
Seybold, Bryan ;
Ross, David A. ;
Deng, Jia ;
Sukthankar, Rahul .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1130-1139
[5]  
Cremers D., 2008, STAT GEOM APPR VIS M, V5604
[6]   Multi-Modality Self-Distillation for Weakly Supervised Temporal Action Localization [J].
Huang, Linjiang ;
Wang, Liang ;
Li, Hongsheng .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 :1504-1519
[7]  
Jiang Y., 2014, THUMOS Challenge: Action Recognition with a Large Number of Classes
[8]  
Kay W., 2017, The Kinetics Human Action Video Dataset
[9]  
Lee P, 2021, AAAI CONF ARTIF INTE, V35, P1854
[10]  
Lee P, 2020, AAAI CONF ARTIF INTE, V34, P11320