G-TAD: Sub-Graph Localization for Temporal Action Detection

被引：372

作者：

Xu, Mengmeng ^{[1
]}

Zhao, Chen ^{[1
]}

Rojas, David S. ^{[1
]}

Thabet, Ali ^{[1
]}

Ghanem, Bernard ^{[1
]}

机构：

[1] King Abdullah Univ Sci & Technol KAUST, Thuwal, Saudi Arabia

来源：

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020) | 2020年

关键词：

D O I：

10.1109/CVPR42600.2020.01017

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Temporal action detection is a fundamental yet challenging task in video understanding. Video context is a critical cue to effectively detect actions, but current works mainly focus on temporal context, while neglecting semantic context as well as other important context properties. In this work, we propose a graph convolutional network (GCN) model to adaptively incorporate multi-level semantic context into video features and cast temporal action detection as a sub-graph localization problem. Specifically, we formulate video snippets as graph nodes, snippet-snippet correlations as edges, and actions associated with context as target sub-graphs. With graph convolution as the basic operation, we design a GCN block called GCNeXt, which learns the features of each node by aggregating its context and dynamically updates the edges in the graph. To localize each sub-graph, we also design an SGAlign layer to embed each sub-graph into the Euclidean space. Extensive experiments show that G-TAD is capable of finding effective video context without extra supervision and achieves state-of-the-art performance on two detection benchmarks. On ActivityNet-1.3, it obtains an average mAP of 34.09%; on THUMOS14, it reaches 51.6% at IoU@0.5 when combined with a proposal processing method. G-TAD code is publicly available at https://github.com/frostinassiky/gtad.

引用

页码：10153 / 10162

页数：10

共 57 条

[1] Action Search: Spotting Actions in Videos and Its Application to Temporal Action Localization [J].

Alwassel, Humam ;

Heilbron, Fabian Caba ;

Ghanem, Bernard .

COMPUTER VISION - ECCV 2018, PT IX, 2018, 11213 :253-269

[2]

Alwassel Humam, 2017, EUR C COMP VIS ECCV

[3]

[Anonymous], 24 INT C ARCH SUPP

[4]

[Anonymous], P EUR C COMP VIS

[5]

[Anonymous], 2016, CUHK & ETHZ & SIAT submission to ActivityNet challenge 2016

[6]

[Anonymous], 2015, IEEE C COMP VIS PATT

[7]

[Anonymous], 2018, P EUR C COMP VIS ECC, DOI [DOI 10.1163/9789004385580_002, DOI 10.1163/9789004385580002]

[8]

Bodla Navaneeth, 2017, INT C COMP VIS ICCV

[9] SST: Single-Stream Temporal Action Proposals [J].

Buch, Shyamal ;

Escorcia, Victor ;

Shen, Chuanqi ;

Ghanem, Bernard ;

Niebles, Juan Carlos .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6373-6382

[10]

Buch Shyamal, 2017, BRIT MACH VIS C BMVC

← 1 2 3 4 5 6 →