Multi-Hierarchical Category Supervision for Weakly-Supervised Temporal Action Localization

被引：12

作者：

Li, Guozhang ^{[1
]}

Li, Jie ^{[1
]}

Wang, Nannan ^{[2
]}

Ding, Xinpeng ^{[3
]}

Li, Zhifeng ^{[4
]}

Gao, Xinbo ^{[5
,6
]}

机构：

[1] Xidian Univ, Sch Elect Engn, State Key Lab Integrat Serv Networks, Xian 710071, Shaanxi, Peoples R China

[2] Xidian Univ, Sch Telecommun Engn, State Key Lab Integrat Serv Networks, Xian 710071, Shaanxi, Peoples R China

[3] Hong Kong Univ Sci & Technol, Sch Engn, Hong Kong, Peoples R China

[4] Tencent, Shenzhen 518057, Peoples R China

[5] Xidian Univ, Sch Elect Engn, Xian 710071, Peoples R China

[6] Chongqing Univ Posts & Telecommun, Chongqing Key Lab Image Cognit, Chongqing 400065, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2021年 / 30卷

基金：

中国国家自然科学基金;

关键词：

Videos; Location awareness; Training; Feature extraction; Bars; Proposals; Measurement; Weak supervision; temporal action localization; multi-hierarchical categories;

D O I：

10.1109/TIP.2021.3124671

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Weakly Supervised Temporal Action Localization (WTAL) aims to localize action segments in untrimmed videos with only video-level category labels in the training phase. In WTAL, an action generally consists of a series of sub-actions, and different categories of actions may share the common sub-actions. However, to distinguish different categories of actions with only video-level class labels, current WTAL models tend to focus on discriminative sub-actions of the action, while ignoring those common sub-actions shared with different categories of actions. This negligence of common sub-actions would lead to the located action segments incomplete, i.e., only containing discriminative sub-actions. Different from current approaches of designing complex network architectures to explore more complete actions, in this paper, we introduce a novel supervision method named multi-hierarchical category supervision (MHCS) to find more sub-actions rather than only the discriminative ones. Specifically, action categories sharing similar sub-actions will be constructed as super-classes through hierarchical clustering. Hence, training with the new generated super-classes would encourage the model to pay more attention to the common sub-actions, which are ignored training with the original classes. Furthermore, our proposed MHCS is model-agnostic and non-intrusive, which can be directly applied to existing methods without changing their structures. Through extensive experiments, we verify that our supervision method can improve the performance of four state-of-the-art WTAL methods on three public datasets: THUMOS14, ActivityNet1.2, and ActivityNet1.3.

引用

页码：9332 / 9344

页数：13

共 54 条

[1]

[Anonymous], THUMOS challenge: Action recognition with a large number of classes

[2] Detection of abnormal behaviour for dementia sufferers using Convolutional Neural Networks [J].

Arifoglu, Damla ;

Bouchachia, Abdelhamid .

ARTIFICIAL INTELLIGENCE IN MEDICINE, 2019, 94 :88-95

[3]

Buch S., 2017, P BRIT MACH VIS C BM

[4]

Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698

[5] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].

Carreira, Joao ;

Zisserman, Andrew .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733

[6] Weakly-Supervised Semantic Segmentation via Sub-category Exploration [J].

Chang, Yu-Ting ;

Wang, Qiaosong ;

Hung, Wei-Chih ;

Piramuthu, Robinson ;

Tsai, Yi-Hsuan ;

Yang, Ming-Hsuan .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :8988-8997

[7] Rethinking the Faster R-CNN Architecture for Temporal Action Localization [J].

Chao, Yu-Wei ;

Vijayanarasimhan, Sudheendra ;

Seybold, Bryan ;

Ross, David A. ;

Deng, Jia ;

Sukthankar, Rahul .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1130-1139

[8]

Ding X., PROC CHIN C PATTERN, V2021, P42

[9] DAPs: Deep Action Proposals for Action Understanding [J].

Escorcia, Victor ;

Heilbron, Fabian Caba ;

Niebles, Juan Carlos ;

Ghanem, Bernard .

COMPUTER VISION - ECCV 2016, PT III, 2016, 9907 :768-784

[10] TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals [J].

Gao, Jiyang ;

Yang, Zhenheng ;

Sun, Chen ;

Chen, Kan ;

Nevatia, Ram .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :3648-3656

← 1 2 3 4 5 6 →