Enriching Local and Global Contexts for Temporal Action Localization

被引:76
作者
Zhu, Zixin [1 ]
Tang, Wei [2 ]
Wang, Le [1 ]
Zheng, Nanning [1 ]
Hua, Gang [3 ]
机构
[1] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian, Peoples R China
[2] Univ Illinois, Chicago, IL USA
[3] Wormpex AI Res, Bellevue, WA 98004 USA
来源
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年
基金
国家重点研发计划;
关键词
ACTION RECOGNITION;
D O I
10.1109/ICCV48922.2021.01326
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Effectively tackling the problem of temporal action localization (TAL) necessitates a visual representation that jointly pursues two confounding goals, i.e., fine-grained discrimination for temporal localization and sufficient visual invariance for action classification. We address this challenge by enriching both the local and global contexts in the popular two-stage temporal localization framework, where action proposals are first generated followed by action classification and temporal boundary regression. Our proposed model, dubbed ContextLoc, can be divided into three subnetworks: L-Net, G-Net and P-Net. L-Net enriches the local context via fine-grained modeling of snippet-level features, which is formulated as a query-and-retrieval process. G-Net enriches the global context via higher-level modeling of the video-level representation. In addition, we introduce a novel context adaptation module to adapt the global context to different proposals. P-Net further models the context-aware inter-proposal relations. We explore two existing models to be the P-Net in our experiments. The efficacy of our proposed method is validated by experimental results on the THUMOS14 (54.3% at tIoU@0.5) and ActivityNet v1.3 (56.01% at tIoU@0.5) datasets, which outperforms recent states of the art. Code is available at https://github.com/buxiangzhiren/ContextLoc.
引用
收藏
页码:13496 / 13505
页数:10
相关论文
共 51 条
[1]  
[Anonymous], 2019, AAAI
[2]  
[Anonymous], 2016, CVPR, DOI DOI 10.1299/TRANSJSME.16-00344
[3]  
[Anonymous], 2019, CVPR
[4]  
[Anonymous], 2018, EUROPEAN CONFERENCE, DOI DOI 10.1007/978-3-030-01225-0_1
[5]  
[Anonymous], 2018, ECCV, DOI DOI 10.1007/978-3-030-01216-85
[6]   Boundary Content Graph Neural Network for Temporal Action Proposal Generation [J].
Bai, Yueran ;
Wang, Yingying ;
Tong, Yunhai ;
Yang, Yang ;
Liu, Qiyue ;
Liu, Junhui .
COMPUTER VISION - ECCV 2020, PT XXVIII, 2020, 12373 :121-137
[7]  
Buch S., 2017, BRIT MACH VIS C, V2, P7
[8]  
Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698
[9]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[10]   Rethinking the Faster R-CNN Architecture for Temporal Action Localization [J].
Chao, Yu-Wei ;
Vijayanarasimhan, Sudheendra ;
Seybold, Bryan ;
Ross, David A. ;
Deng, Jia ;
Sukthankar, Rahul .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1130-1139