Boundary Content Graph Neural Network for Temporal Action Proposal Generation

被引：122

作者：

Bai, Yueran ^{[1
]}

Wang, Yingying ^{[2
]}

Tong, Yunhai ^{[1
]}

Yang, Yang ^{[2
]}

Liu, Qiyue ^{[2
]}

Liu, Junhui ^{[2
]}

机构：

[1] Peking Univ, Sch EECS, Key Lab Machine Percept MOE, Beijing, Peoples R China

[2] IQIYI Inc, Beijing, Peoples R China

来源：

COMPUTER VISION - ECCV 2020, PT XXVIII | 2020年 / 12373卷

关键词：

Temporal action proposal generation; Graph Neural Network; Temporal action detection;

D O I：

10.1007/978-3-030-58604-1_8

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Temporal action proposal generation plays an important role in video action understanding, which requires localizing high-quality action content precisely. However, generating temporal proposals with both precise boundaries and high-quality action content is extremely challenging. To address this issue, we propose a novel Boundary Content Graph Neural Network (BC-GNN) to model the insightful relations between the boundary and action content of temporal proposals by the graph neural networks. In BC-GNN, the boundaries and content of temporal proposals are taken as the nodes and edges of the graph neural network, respectively, where they are spontaneously linked. Then a novel graph computation operation is proposed to update features of edges and nodes. After that, one updated edge and two nodes it connects are used to predict boundary probabilities and content confidence score, which will be combined to generate a final high-quality proposal. Experiments are conducted on two mainstream datasets: ActivityNet-1.3 and THUMOS14. Without the bells and whistles, BC-GNN outperforms previous state-of-the-art methods in both temporal action proposal and temporal action detection tasks.

引用

页码：121 / 137

页数：17

共 33 条

[1]

[Anonymous], 2007, P 15 ACM INT C MULT, DOI 10.1145/1291233.1291311

[2] SST: Single-Stream Temporal Action Proposals [J].

Buch, Shyamal ;

Escorcia, Victor ;

Shen, Chuanqi ;

Ghanem, Bernard ;

Niebles, Juan Carlos .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6373-6382

[3] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].

Carreira, Joao ;

Zisserman, Andrew .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733

[4] Human detection using oriented histograms of flow and appearance [J].

Dalal, Navneet ;

Triggs, Bill ;

Schmid, Cordelia .

COMPUTER VISION - ECCV 2006, PT 2, PROCEEDINGS, 2006, 3952 :428-441

[5] Learning Spatiotemporal Features with 3D Convolutional Networks [J].

Du Tran ;

Bourdev, Lubomir ;

Fergus, Rob ;

Torresani, Lorenzo ;

Paluri, Manohar .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497

[6] DAPs: Deep Action Proposals for Action Understanding [J].

Escorcia, Victor ;

Heilbron, Fabian Caba ;

Niebles, Juan Carlos ;

Ghanem, Bernard .

COMPUTER VISION - ECCV 2016, PT III, 2016, 9907 :768-784

[7] Convolutional Two-Stream Network Fusion for Video Action Recognition [J].

Feichtenhofer, Christoph ;

Pinz, Axel ;

Zisserman, Andrew .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1933-1941

[8] CTAP: Complementary Temporal Action Proposal Generation [J].

Gao, Jiyang ;

Chen, Kan ;

Nevatia, Ram .

COMPUTER VISION - ECCV 2018, PT II, 2018, 11206 :70-85

[9] TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals [J].

Gao, Jiyang ;

Yang, Zhenheng ;

Sun, Chen ;

Chen, Kan ;

Nevatia, Ram .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :3648-3656

[10]

Ghanem B, 2017, Arxiv, DOI arXiv:1710.08011

← 1 2 3 4 →