STEP: Spatio-Temporal Progressive Learning for Video Action Detection

被引:95
|
作者
Yang, Xitong [1 ,4 ]
Yang, Xiaodong [2 ]
Liu, Ming-Yu [2 ]
Xiao, Fanyi [3 ,4 ]
Davis, Larry [1 ]
Kautz, Jan [2 ]
机构
[1] Univ Maryland, College Pk, MD 20742 USA
[2] NVIDIA, Santa Clara, CA USA
[3] Univ Calif Davis, Davis, CA 95616 USA
[4] NVIDIA Res, Santa Clara, CA USA
关键词
D O I
10.1109/CVPR.2019.00035
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose Spatio-TEmporal Progressive (STEP) action detector-a progressive learning framework for spatio-temporal action detection in videos. Starting from a handful of coarse-scale proposal cuboids, our approach progressively refines the proposals towards actions over a few steps. In this way, high-quality proposals (i.e., adhere to action movements) can be gradually obtained at later steps by leveraging the regression outputs from previous steps. At each step, we adaptively extend the proposals in time to incorporate more related temporal context. Compared to the prior work that performs action detection in one run, our progressive learning framework is able to naturally handle the spatial displacement within action tubes and therefore provides a more effective way for spatio-temporal modeling. We extensively evaluate our approach on UCF101 and AVA, and demonstrate superior detection results. Remarkably, we achieve mAP of 75.0% and 18.6% on the two datasets with 3 progressive steps and using respectively only 11 and 34 initial proposals.
引用
收藏
页码:264 / 272
页数:9
相关论文
共 50 条
  • [21] LEARNING SPATIO-TEMPORAL DEPENDENCIES FOR ACTION RECOGNITION
    Cai, Qiao
    Yin, Yafeng
    Man, Hong
    2013 20TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2013), 2013, : 3740 - 3744
  • [22] Online Spatio-temporal Action Detection for Eldercare
    Koh, Thean Chun
    Yeo, Chai Kiat
    Jing, Xuan
    2023 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI, 2023, : 126 - 127
  • [23] Learning channel -wise spatio-temporal representations for video salient object detection
    Huang, Kan
    Li, Ge
    Liu, Shan
    NEUROCOMPUTING, 2020, 403 : 325 - 336
  • [24] Scale-Aware Spatio-Temporal Relation Learning for Video Anomaly Detection
    Li, Guoqiu
    Cai, Guanxiong
    Zeng, Xingyu
    Zhao, Rui
    COMPUTER VISION - ECCV 2022, PT IV, 2022, 13664 : 333 - 350
  • [25] Bidirectional Spatio-Temporal Feature Learning With Multiscale Evaluation for Video Anomaly Detection
    Zhong, Yuanhong
    Chen, Xia
    Hu, Yongting
    Tang, Panliang
    Ren, Fan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (12) : 8285 - 8296
  • [26] Action Recognition by Learning Deep Multi-Granular Spatio-Temporal Video Representation
    Li, Qing
    Qiu, Zhaofan
    Yao, Ting
    Mei, Tao
    Rui, Yong
    Luo, Jiebo
    ICMR'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2016, : 159 - 166
  • [27] Spatio-Temporal Information for Action Recognition in Thermal Video Using Deep Learning Model
    Srihari, P.
    Harikiran, J.
    INTERNATIONAL JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING SYSTEMS, 2022, 13 (08) : 669 - 680
  • [28] YOWOv3: A Lightweight Spatio-Temporal Joint Network for Video Action Detection
    Zhu, Anlei
    Wang, Yinghui
    Yang, Jinlong
    Yan, Tao
    Ma, Haomiao
    Li, Wei
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (09) : 8148 - 8160
  • [29] Spatio-Temporal Crop Aggregation for Video Representation Learning
    Sameni, Sepehr
    Jenni, Simon
    Favaro, Paolo
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5641 - 5651
  • [30] Deconfounded Multimodal Learning for Spatio-temporal Video Grounding
    Wang, Jiawei
    Ma, Zhanchang
    Cao, Da
    Le, Yuquan
    Xiao, Junbin
    Chua, Tat-Seng
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 7521 - 7529