Multi-Scale Proposal Regression Network for Temporal Action Proposal Generation

被引:5
作者
Zheng, Jingye [1 ]
Chen, Dihu [1 ]
Hu, Haifeng [1 ]
机构
[1] Sun Yat Sen Univ, Sch Elect & Informat Technol, Guangzhou 510006, Peoples R China
来源
IEEE ACCESS | 2019年 / 7卷
基金
中国国家自然科学基金;
关键词
Convolutional neural network; temporal action detection; temporal action proposal generation; video analysis;
D O I
10.1109/ACCESS.2019.2933360
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Temporal action detection, as a branch of video analysis, aims to locate the time points when the actions start and end, and classify the actions occurred in videos into correct categories. Generating high-quality proposals is a key step in temporal action detection task. In this paper, we introduce a novel network, named multi-scale proposal regression network (MPRN), for temporal action proposal generation. First, we take encoding visual features as input and predict action scores for time points, in order to group them to generate rough proposals. Then, we regress the proposal's boundaries to obtain more precise proposals via our multi-scale proposal regression network. Compared with SSN and TURN, our multi-scale regression segments are characterized by flexible boundaries. Experiments show that 1) Our method is better than other proposal generation methods on THUMOS-14 dataset and ActivityNet-v1.3 dataset. 2) The effectiveness of our method is due to its own architecture, not the selection of visual feature encoders. 3) Our proposal generation method can generate temporal proposals for unseen action classes, which shows the good generalization ability of our proposal generation method.
引用
收藏
页码:183860 / 183868
页数:9
相关论文
共 31 条
  • [1] [Anonymous], 2016, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2016.214
  • [2] Soft-NMS - Improving Object Detection With One Line of Code
    Bodla, Navaneeth
    Singh, Bharat
    Chellappa, Rama
    Davis, Larry S.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5562 - 5570
  • [3] Buch S., 2017, P BRIT MACH VIS C BM
  • [4] SST: Single-Stream Temporal Action Proposals
    Buch, Shyamal
    Escorcia, Victor
    Shen, Chuanqi
    Ghanem, Bernard
    Niebles, Juan Carlos
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6373 - 6382
  • [5] Rethinking the Faster R-CNN Architecture for Temporal Action Localization
    Chao, Yu-Wei
    Vijayanarasimhan, Sudheendra
    Seybold, Bryan
    Ross, David A.
    Deng, Jia
    Sukthankar, Rahul
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 1130 - 1139
  • [6] Learning Spatiotemporal Features with 3D Convolutional Networks
    Du Tran
    Bourdev, Lubomir
    Fergus, Rob
    Torresani, Lorenzo
    Paluri, Manohar
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497
  • [7] DAPs: Deep Action Proposals for Action Understanding
    Escorcia, Victor
    Heilbron, Fabian Caba
    Niebles, Juan Carlos
    Ghanem, Bernard
    [J]. COMPUTER VISION - ECCV 2016, PT III, 2016, 9907 : 768 - 784
  • [8] Convolutional Two-Stream Network Fusion for Video Action Recognition
    Feichtenhofer, Christoph
    Pinz, Axel
    Zisserman, Andrew
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 1933 - 1941
  • [9] Gao J., 2017, BMVC, P1
  • [10] TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals
    Gao, Jiyang
    Yang, Zhenheng
    Sun, Chen
    Chen, Kan
    Nevatia, Ram
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 3648 - 3656