Coarse-to-Fine Localization of Temporal Action Proposals

被引：28

作者：

Long, Fuchen ^{[1
]}

Yao, Ting ^{[2
]}

Qiu, Zhaofan ^{[1
]}

Tian, Xinmei ^{[1
]}

Mei, Tao ^{[2
]}

Luo, Jiebo ^{[3
]}

机构：

[1] Univ Sci & Technol China, Elect Engn & Informat Sci, Hefei 230027, Peoples R China

[2] JD AI Res, Vis & Multimedia Lab, Beijing 100105, Peoples R China

[3] Univ Rochester, Dept Comp Sci, Rochester, NY 14604 USA

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2020年 / 22卷 / 06期

关键词：

Proposals; Videos; Painting; Brushes; Microsoft Windows; Task analysis; Feature extraction; Action Proposals; Action Recognition; Action Detection; Video Captioning;

D O I：

10.1109/TMM.2019.2943204

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Localizing temporal action proposals from long videos is a fundamental challenge in video analysis (e.g., action detection and recognition or dense video captioning). Most existing approaches often overlook the hierarchical granularities of actions and thus fail to discriminate fine-grained action proposals (e.g., hand washing laundry or changing a tire in vehicle repair). In this paper, we propose a novel coarse-to-fine temporal proposal (CFTP) approach to localize temporal action proposals by exploring different action granularities. Our proposed CFTP consists of three stages: a coarse proposal network (CPN) to generate long action proposals, a temporal convolutional anchor network (CAN) to localize finer proposals, and a proposal reranking network (PRN) to further identify proposals from previous stages. Specifically, CPN explores three complementary actionness curves (namely pointwise, pairwise, and recurrent curves) that represent actions at different levels for generating coarse proposals, while CAN refines these proposals by a multiscale cascaded 1D-convolutional anchor network. In contrast to existing works, our coarse-to-fine approach can progressively localize fine-grained action proposals. We conduct extensive experiments on two action benchmarks (THUMOS14 and ActivityNet v1.3) and demonstrate the superior performance of our approach when compared to the state-of-the-art techniques on various video understanding tasks.

引用

页码：1577 / 1590

页数：14

共 54 条

[1]

[Anonymous], 2016, ARXIV160701979

[2]

[Anonymous], P C N AM ASS COMP LI

[3]

[Anonymous], 2017, CORR

[4]

[Anonymous], P CVPR ACTIVITYNET C

[5]

[Anonymous], 2014, P COMP VIS PATT REC

[6]

[Anonymous], 2017, IEEE C COMP VIS PATT

[7]

[Anonymous], 2014, ECCV WORKSH

[8]

[Anonymous], 2014, P ECCVTHUMOS CHALL W

[9] SST: Single-Stream Temporal Action Proposals [J].

Buch, Shyamal ;

Escorcia, Victor ;

Shen, Chuanqi ;

Ghanem, Bernard ;

Niebles, Juan Carlos .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6373-6382

[10]

Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698

← 1 2 3 4 5 6 →