Enriching Local and Global Contexts for Temporal Action Localization

被引：76

作者：

Zhu, Zixin ^{[1
]}

Tang, Wei ^{[2
]}

Wang, Le ^{[1
]}

Zheng, Nanning ^{[1
]}

Hua, Gang ^{[3
]}

机构：

[1] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian, Peoples R China

[2] Univ Illinois, Chicago, IL USA

[3] Wormpex AI Res, Bellevue, WA 98004 USA

来源：

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年

基金：

国家重点研发计划;

关键词：

ACTION RECOGNITION;

D O I：

10.1109/ICCV48922.2021.01326

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Effectively tackling the problem of temporal action localization (TAL) necessitates a visual representation that jointly pursues two confounding goals, i.e., fine-grained discrimination for temporal localization and sufficient visual invariance for action classification. We address this challenge by enriching both the local and global contexts in the popular two-stage temporal localization framework, where action proposals are first generated followed by action classification and temporal boundary regression. Our proposed model, dubbed ContextLoc, can be divided into three subnetworks: L-Net, G-Net and P-Net. L-Net enriches the local context via fine-grained modeling of snippet-level features, which is formulated as a query-and-retrieval process. G-Net enriches the global context via higher-level modeling of the video-level representation. In addition, we introduce a novel context adaptation module to adapt the global context to different proposals. P-Net further models the context-aware inter-proposal relations. We explore two existing models to be the P-Net in our experiments. The efficacy of our proposed method is validated by experimental results on the THUMOS14 (54.3% at tIoU@0.5) and ActivityNet v1.3 (56.01% at tIoU@0.5) datasets, which outperforms recent states of the art. Code is available at https://github.com/buxiangzhiren/ContextLoc.

引用

页码：13496 / 13505

页数：10

共 51 条

[1]

[Anonymous], 2019, AAAI

[2]

[Anonymous], 2016, CVPR, DOI DOI 10.1299/TRANSJSME.16-00344

[3]

[Anonymous], 2019, CVPR

[4]

[Anonymous], 2018, EUROPEAN CONFERENCE, DOI DOI 10.1007/978-3-030-01225-0_1

[5]

[Anonymous], 2018, ECCV, DOI DOI 10.1007/978-3-030-01216-85

[6] Boundary Content Graph Neural Network for Temporal Action Proposal Generation [J].

Bai, Yueran ;

Wang, Yingying ;

Tong, Yunhai ;

Yang, Yang ;

Liu, Qiyue ;

Liu, Junhui .

COMPUTER VISION - ECCV 2020, PT XXVIII, 2020, 12373 :121-137

[7]

Buch S., 2017, BRIT MACH VIS C, V2, P7

[8]

Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698

[9] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].

Carreira, Joao ;

Zisserman, Andrew .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733

[10] Rethinking the Faster R-CNN Architecture for Temporal Action Localization [J].

Chao, Yu-Wei ;

Vijayanarasimhan, Sudheendra ;

Seybold, Bryan ;

Ross, David A. ;

Deng, Jia ;

Sukthankar, Rahul .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1130-1139

← 1 2 3 4 5 6 →