Graph Convolutional Networks for Temporal Action Localization

被引：426

作者：

Zeng, Runhao ^{[1
,2
]}

Huang, Wenbing ^{[2
,5
]}

Tan, Mingkui ^{[1
,4
]}

Rong, Yu ^{[2
]}

Zhao, Peilin ^{[2
]}

Huang, Junzhou ^{[2
]}

Gan, Chuang ^{[3
]}

机构：

[1] South China Univ Technol, Sch Software Engn, Guangzhou, Peoples R China

[2] Tencent AI Lab, Shenzhen, Peoples R China

[3] MIT, IBM Watson AI Lab, Cambridge, MA 02139 USA

[4] Peng Cheng Lab, Shenzhen, Peoples R China

[5] Tsinghua Univ, State Key Lab Intelligent Technol & Syst, Tsinghua Natl Lab Informat Sci & Technol TNList, Dept Comp Sci & Technol, Beijing, Peoples R China

来源：

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) | 2019年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/ICCV.2019.00719

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Most state-of-the-art action localization systems process each action proposal individually, without explicitly exploiting their relations during learning. However, the relations between proposals actually play an important role in action localization, since a meaningful action always consists of multiple proposals in a video. In this paper, we propose to exploit the proposal-proposal relations using Graph Convolutional Networks (GCNs). First, we construct an action proposal graph, where each proposal is represented as a node and their relations between two proposals as an edge. Here, we use two types of relations, one for capturing the context information for each proposal and the other one for characterizing the correlations between distinct actions. Then we apply the GCNs over the graph to model the relations among different proposals and learn powerful representations for the action classification and localization. Experimental results show that our approach significantly outperforms the state-of-the-art on THUMOS14 (49.1% versus 42.8%). Moreover, augmentation experiments on ActivityNet also verify the efficacy of modeling action proposal relationships.

引用

页码：7093 / 7102

页数：10

共 53 条

[1]

[Anonymous], 2016, 1 NIPS WORKSH LARG S

[2]

[Anonymous], 2019, TMM

[3]

Buch S., 2017, P BRIT MACH VIS C BM

[4] SST: Single-Stream Temporal Action Proposals [J].

Buch, Shyamal ;

Escorcia, Victor ;

Shen, Chuanqi ;

Ghanem, Bernard ;

Niebles, Juan Carlos .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6373-6382

[5] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].

Carreira, Joao ;

Zisserman, Andrew .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733

[6] Rethinking the Faster R-CNN Architecture for Temporal Action Localization [J].

Chao, Yu-Wei ;

Vijayanarasimhan, Sudheendra ;

Seybold, Bryan ;

Ross, David A. ;

Deng, Jia ;

Sukthankar, Rahul .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1130-1139

[7] How Do the Open Source Communities Address Usability and UX Issues? An Exploratory Study [J].

Cheng, Jinghui ;

Guo, Jin L. C. .

CHI 2018: EXTENDED ABSTRACTS OF THE 2018 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2018,

[8] Geometry Guided Convolutional Neural Networks for Self-Supervised Video Representation Learning [J].

Gan, Chuang ;

Gong, Boqing ;

Liu, Kun ;

Su, Hao ;

Guibas, Leonidas J. .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5589-5597

[9] Temporal Context Network for Activity Localization in Videos [J].

Dai, Xiyang ;

Singh, Bharat ;

Zhang, Guyue ;

Davis, Larry S. ;

Chen, Yan Qiu .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5727-5736

[10] Visual Grounding via Accumulated Attention [J].

Deng, Chaorui ;

Wu, Qi ;

Wu, Qingyao ;

Hu, Fuyuan ;

Lyu, Fan ;

Tan, Mingkui .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7746-7755

← 1 2 3 4 5 6 →