Exploring frame segmentation networks for temporal action localization

被引：7

作者：

Yang, Ke ^{[1
]}

Shen, Xiaolong ^{[1
]}

Qiao, Peng ^{[1
]}

Li, Shijie ^{[1
]}

Li, Dongsheng ^{[1
]}

Dou, Yong ^{[1
]}

机构：

[1] Natl Univ Def Technol, Coll Comp, Natl Lab Parallel & Distributed Proc, Changsha, Hunan, Peoples R China

来源：

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION | 2019年 / 61卷

基金：

中国国家自然科学基金;

关键词：

Action detection; Temporal action localization; Convolutional Neural Network;

D O I：

10.1016/j.jvcir.2019.02.003

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Temporal action localization is an important task of computer vision. Though many methods have been proposed, it still remains an open question how to predict the temporal location of action segments precisely. Most state-of-the-art works train action classifiers on video segments pre-determined by action proposal. However, recent work found that a desirable model should move beyond segment-level and make dense predictions at a fine granularity in time to determine precise temporal boundaries. In this paper, we propose a Frame Segmentation Network (FSN) that places a temporal CNN on top of the 2D spatial CNNs. Spatial CNNs are responsible for abstracting semantics in spatial dimension while temporal CNN is responsible for introducing temporal context information and performing dense predictions. The proposed FSN can make dense predictions at frame-level for a video clip using both spatial and temporal context information. FSN is trained in an end-to-end manner, so the model can be optimized in spatial and temporal domain jointly. We also adapt FSN to use it in weakly supervised scenario (WFSN), where only video level labels are provided when training. Experiment results on public dataset show that FSN achieves superior performance in both frame-level action localization and temporal action localization. (C) 2019 Elsevier Inc. All rights reserved.

引用

页码：296 / 302

页数：7

共 36 条

[1] [Anonymous], 2017, ARXIV170404232
[2] BILEN H, 2016, PROC CVPR IEEE, P3034, DOI DOI 10.1109/CVPR.2016.331
[3] Bojanowski P, 2014, LECT NOTES COMPUT SC, V8693, P628, DOI 10.1007/978-3-319-10602-1_41
[4] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Carreira, Joao
Zisserman, Andrew
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
[5] Courtney PG, 2015, IEEE COMP SEMICON
[6] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[7] Donahue J, 2015, PROC CVPR IEEE, P2625, DOI 10.1109/CVPR.2015.7298878
[8] DAPs: Deep Action Proposals for Action Understanding
Escorcia, Victor
Heilbron, Fabian Caba
Niebles, Juan Carlos
Ghanem, Bernard
[J]. COMPUTER VISION - ECCV 2016, PT III, 2016, 9907 : 768 - 784
[9] Girshick R., 2014, IEEE COMP SOC C COMP, DOI [10.1109/CVPR.2014.81, DOI 10.1109/CVPR.2014.81]
[10] He K., 2016, IEEE C COMPUT VIS PA, DOI [10.1007/978-3-319-46493-0_38, DOI 10.1007/978-3-319-46493-0_38, DOI 10.1109/CVPR.2016.90]

← 1 2 3 4 →