Multi-Scale Proposal Regression Network for Temporal Action Proposal Generation

被引：5

作者：

Zheng, Jingye ^{[1
]}

Chen, Dihu ^{[1
]}

Hu, Haifeng ^{[1
]}

机构：

[1] Sun Yat Sen Univ, Sch Elect & Informat Technol, Guangzhou 510006, Peoples R China

来源：

IEEE ACCESS | 2019年 / 7卷

基金：

中国国家自然科学基金;

关键词：

Convolutional neural network; temporal action detection; temporal action proposal generation; video analysis;

D O I：

10.1109/ACCESS.2019.2933360

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Temporal action detection, as a branch of video analysis, aims to locate the time points when the actions start and end, and classify the actions occurred in videos into correct categories. Generating high-quality proposals is a key step in temporal action detection task. In this paper, we introduce a novel network, named multi-scale proposal regression network (MPRN), for temporal action proposal generation. First, we take encoding visual features as input and predict action scores for time points, in order to group them to generate rough proposals. Then, we regress the proposal's boundaries to obtain more precise proposals via our multi-scale proposal regression network. Compared with SSN and TURN, our multi-scale regression segments are characterized by flexible boundaries. Experiments show that 1) Our method is better than other proposal generation methods on THUMOS-14 dataset and ActivityNet-v1.3 dataset. 2) The effectiveness of our method is due to its own architecture, not the selection of visual feature encoders. 3) Our proposal generation method can generate temporal proposals for unseen action classes, which shows the good generalization ability of our proposal generation method.

引用

页码：183860 / 183868

页数：9

共 31 条

[1] [Anonymous], 2016, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2016.214
[2] Soft-NMS - Improving Object Detection With One Line of Code
Bodla, Navaneeth
Singh, Bharat
Chellappa, Rama
Davis, Larry S.
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5562 - 5570
[3] Buch S., 2017, P BRIT MACH VIS C BM
[4] SST: Single-Stream Temporal Action Proposals
Buch, Shyamal
Escorcia, Victor
Shen, Chuanqi
Ghanem, Bernard
Niebles, Juan Carlos
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6373 - 6382
[5] Rethinking the Faster R-CNN Architecture for Temporal Action Localization
Chao, Yu-Wei
Vijayanarasimhan, Sudheendra
Seybold, Bryan
Ross, David A.
Deng, Jia
Sukthankar, Rahul
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 1130 - 1139
[6] Learning Spatiotemporal Features with 3D Convolutional Networks
Du Tran
Bourdev, Lubomir
Fergus, Rob
Torresani, Lorenzo
Paluri, Manohar
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497
[7] DAPs: Deep Action Proposals for Action Understanding
Escorcia, Victor
Heilbron, Fabian Caba
Niebles, Juan Carlos
Ghanem, Bernard
[J]. COMPUTER VISION - ECCV 2016, PT III, 2016, 9907 : 768 - 784
[8] Convolutional Two-Stream Network Fusion for Video Action Recognition
Feichtenhofer, Christoph
Pinz, Axel
Zisserman, Andrew
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 1933 - 1941
[9] Gao J., 2017, BMVC, P1
[10] TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals
Gao, Jiyang
Yang, Zhenheng
Sun, Chen
Chen, Kan
Nevatia, Ram
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 3648 - 3656

← 1 2 3 4 →