Enriching Local and Global Contexts for Temporal Action Localization

被引:76
作者
Zhu, Zixin [1 ]
Tang, Wei [2 ]
Wang, Le [1 ]
Zheng, Nanning [1 ]
Hua, Gang [3 ]
机构
[1] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian, Peoples R China
[2] Univ Illinois, Chicago, IL USA
[3] Wormpex AI Res, Bellevue, WA 98004 USA
来源
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年
基金
国家重点研发计划;
关键词
ACTION RECOGNITION;
D O I
10.1109/ICCV48922.2021.01326
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Effectively tackling the problem of temporal action localization (TAL) necessitates a visual representation that jointly pursues two confounding goals, i.e., fine-grained discrimination for temporal localization and sufficient visual invariance for action classification. We address this challenge by enriching both the local and global contexts in the popular two-stage temporal localization framework, where action proposals are first generated followed by action classification and temporal boundary regression. Our proposed model, dubbed ContextLoc, can be divided into three subnetworks: L-Net, G-Net and P-Net. L-Net enriches the local context via fine-grained modeling of snippet-level features, which is formulated as a query-and-retrieval process. G-Net enriches the global context via higher-level modeling of the video-level representation. In addition, we introduce a novel context adaptation module to adapt the global context to different proposals. P-Net further models the context-aware inter-proposal relations. We explore two existing models to be the P-Net in our experiments. The efficacy of our proposed method is validated by experimental results on the THUMOS14 (54.3% at tIoU@0.5) and ActivityNet v1.3 (56.01% at tIoU@0.5) datasets, which outperforms recent states of the art. Code is available at https://github.com/buxiangzhiren/ContextLoc.
引用
收藏
页码:13496 / 13505
页数:10
相关论文
共 51 条
[41]   A Robust and Efficient Video Representation for Action Recognition [J].
Wang, Heng ;
Oneata, Dan ;
Verbeek, Jakob ;
Schmid, Cordelia .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2016, 119 (03) :219-238
[42]   Action Recognition with Improved Trajectories [J].
Wang, Heng ;
Schmid, Cordelia .
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, :3551-3558
[43]   UntrimmedNets for Weakly Supervised Action Recognition and Detection [J].
Wang, Limin ;
Xiong, Yuanjun ;
Lin, Dahua ;
Van Gool, Luc .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6402-6411
[44]   Deep Parametric Continuous Convolutional Neural Networks [J].
Wang, Shenlong ;
Suo, Simon ;
Ma, Wei-Chiu ;
Pokrovsky, Andrei ;
Urtasun, Raquel .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2589-2597
[45]   R-C3D: Region Convolutional 3D Network for Temporal Activity Detection [J].
Xu, Huijuan ;
Das, Abir ;
Saenko, Kate .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5794-5803
[46]   G-TAD: Sub-Graph Localization for Temporal Action Detection [J].
Xu, Mengmeng ;
Zhao, Chen ;
Rojas, David S. ;
Thabet, Ali ;
Ghanem, Bernard .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :10153-10162
[47]   End-to-end Learning of Action Detection from Frame Glimpses in Videos [J].
Yeung, Serena ;
Russakovsky, Olga ;
Mori, Greg ;
Li Fei-Fei .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2678-2687
[48]   Graph Convolutional Networks for Temporal Action Localization [J].
Zeng, Runhao ;
Huang, Wenbing ;
Tan, Mingkui ;
Rong, Yu ;
Zhao, Peilin ;
Huang, Junzhou ;
Gan, Chuang .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :7093-7102
[49]   Two-Stream Consensus Network for Weakly-Supervised Temporal Action Localization [J].
Zhai, Yuanhao ;
Wang, Le ;
Tang, Wei ;
Zhang, Qilin ;
Yuan, Junsong ;
Hua, Gang .
COMPUTER VISION - ECCV 2020, PT VI, 2020, 12351 :37-54
[50]   Bottom-Up Temporal Action Localization with Mutual Regularization [J].
Zhao, Peisen ;
Xie, Lingxi ;
Ju, Chen ;
Zhang, Ya ;
Wang, Yanfeng ;
Tian, Qi .
COMPUTER VISION - ECCV 2020, PT VIII, 2020, 12353 :539-555