Multi-Hierarchical Category Supervision for Weakly-Supervised Temporal Action Localization

被引:12
作者
Li, Guozhang [1 ]
Li, Jie [1 ]
Wang, Nannan [2 ]
Ding, Xinpeng [3 ]
Li, Zhifeng [4 ]
Gao, Xinbo [5 ,6 ]
机构
[1] Xidian Univ, Sch Elect Engn, State Key Lab Integrat Serv Networks, Xian 710071, Shaanxi, Peoples R China
[2] Xidian Univ, Sch Telecommun Engn, State Key Lab Integrat Serv Networks, Xian 710071, Shaanxi, Peoples R China
[3] Hong Kong Univ Sci & Technol, Sch Engn, Hong Kong, Peoples R China
[4] Tencent, Shenzhen 518057, Peoples R China
[5] Xidian Univ, Sch Elect Engn, Xian 710071, Peoples R China
[6] Chongqing Univ Posts & Telecommun, Chongqing Key Lab Image Cognit, Chongqing 400065, Peoples R China
基金
中国国家自然科学基金;
关键词
Videos; Location awareness; Training; Feature extraction; Bars; Proposals; Measurement; Weak supervision; temporal action localization; multi-hierarchical categories;
D O I
10.1109/TIP.2021.3124671
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Weakly Supervised Temporal Action Localization (WTAL) aims to localize action segments in untrimmed videos with only video-level category labels in the training phase. In WTAL, an action generally consists of a series of sub-actions, and different categories of actions may share the common sub-actions. However, to distinguish different categories of actions with only video-level class labels, current WTAL models tend to focus on discriminative sub-actions of the action, while ignoring those common sub-actions shared with different categories of actions. This negligence of common sub-actions would lead to the located action segments incomplete, i.e., only containing discriminative sub-actions. Different from current approaches of designing complex network architectures to explore more complete actions, in this paper, we introduce a novel supervision method named multi-hierarchical category supervision (MHCS) to find more sub-actions rather than only the discriminative ones. Specifically, action categories sharing similar sub-actions will be constructed as super-classes through hierarchical clustering. Hence, training with the new generated super-classes would encourage the model to pay more attention to the common sub-actions, which are ignored training with the original classes. Furthermore, our proposed MHCS is model-agnostic and non-intrusive, which can be directly applied to existing methods without changing their structures. Through extensive experiments, we verify that our supervision method can improve the performance of four state-of-the-art WTAL methods on three public datasets: THUMOS14, ActivityNet1.2, and ActivityNet1.3.
引用
收藏
页码:9332 / 9344
页数:13
相关论文
共 54 条
[1]  
[Anonymous], THUMOS challenge: Action recognition with a large number of classes
[2]   Detection of abnormal behaviour for dementia sufferers using Convolutional Neural Networks [J].
Arifoglu, Damla ;
Bouchachia, Abdelhamid .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2019, 94 :88-95
[3]  
Buch S., 2017, P BRIT MACH VIS C BM
[4]  
Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698
[5]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[6]   Weakly-Supervised Semantic Segmentation via Sub-category Exploration [J].
Chang, Yu-Ting ;
Wang, Qiaosong ;
Hung, Wei-Chih ;
Piramuthu, Robinson ;
Tsai, Yi-Hsuan ;
Yang, Ming-Hsuan .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :8988-8997
[7]   Rethinking the Faster R-CNN Architecture for Temporal Action Localization [J].
Chao, Yu-Wei ;
Vijayanarasimhan, Sudheendra ;
Seybold, Bryan ;
Ross, David A. ;
Deng, Jia ;
Sukthankar, Rahul .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1130-1139
[8]  
Ding X., PROC CHIN C PATTERN, V2021, P42
[9]   DAPs: Deep Action Proposals for Action Understanding [J].
Escorcia, Victor ;
Heilbron, Fabian Caba ;
Niebles, Juan Carlos ;
Ghanem, Bernard .
COMPUTER VISION - ECCV 2016, PT III, 2016, 9907 :768-784
[10]   TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals [J].
Gao, Jiyang ;
Yang, Zhenheng ;
Sun, Chen ;
Chen, Kan ;
Nevatia, Ram .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :3648-3656