STEP: Spatio-Temporal Progressive Learning for Video Action Detection

被引:95
|
作者
Yang, Xitong [1 ,4 ]
Yang, Xiaodong [2 ]
Liu, Ming-Yu [2 ]
Xiao, Fanyi [3 ,4 ]
Davis, Larry [1 ]
Kautz, Jan [2 ]
机构
[1] Univ Maryland, College Pk, MD 20742 USA
[2] NVIDIA, Santa Clara, CA USA
[3] Univ Calif Davis, Davis, CA 95616 USA
[4] NVIDIA Res, Santa Clara, CA USA
关键词
D O I
10.1109/CVPR.2019.00035
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose Spatio-TEmporal Progressive (STEP) action detector-a progressive learning framework for spatio-temporal action detection in videos. Starting from a handful of coarse-scale proposal cuboids, our approach progressively refines the proposals towards actions over a few steps. In this way, high-quality proposals (i.e., adhere to action movements) can be gradually obtained at later steps by leveraging the regression outputs from previous steps. At each step, we adaptively extend the proposals in time to incorporate more related temporal context. Compared to the prior work that performs action detection in one run, our progressive learning framework is able to naturally handle the spatial displacement within action tubes and therefore provides a more effective way for spatio-temporal modeling. We extensively evaluate our approach on UCF101 and AVA, and demonstrate superior detection results. Remarkably, we achieve mAP of 75.0% and 18.6% on the two datasets with 3 progressive steps and using respectively only 11 and 34 initial proposals.
引用
收藏
页码:264 / 272
页数:9
相关论文
共 50 条
  • [31] On the Importance of Spatio-Temporal Learning for Video Quality Assessment
    Fontanel, Dario
    Higham, David
    Vallade, Benoit Quentin Arthur
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW), 2023, : 481 - 487
  • [32] Video representation learning by identifying spatio-temporal transformations
    Sheng Geng
    Shimin Zhao
    Hu Liu
    Applied Intelligence, 2022, 52 : 6613 - 6622
  • [33] Learning Spatio-Temporal Downsampling for Effective Video Upscaling
    Xiang, Xiaoyu
    Tian, Yapeng
    Rengarajan, Vijay
    Young, Lucas D.
    Zhu, Bo
    Ranjan, Rakesh
    COMPUTER VISION - ECCV 2022, PT XVIII, 2022, 13678 : 162 - 181
  • [34] Transformer with Spatio-Temporal Representation for Video Anomaly Detection
    Sun, Xiaohu
    Chen, Jinyi
    Shen, Xulin
    Li, Hongjun
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, S+SSPR 2022, 2022, 13813 : 213 - 222
  • [35] Spatio-Temporal United Memory for Video Anomaly Detection
    Wang, Yunlong
    Chen, Mingyi
    Li, Jiaxin
    Li, Hongjun
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, S+SSPR 2022, 2022, 13813 : 84 - 93
  • [36] Video representation learning by identifying spatio-temporal transformations
    Geng, Sheng
    Zhao, Shimin
    Liu, Hu
    APPLIED INTELLIGENCE, 2022, 52 (06) : 6613 - 6622
  • [37] Learning Spatio-Temporal Sharpness Map for Video Deblurring
    Zhu, Qi
    Zheng, Naishan
    Huang, Jie
    Zhou, Man
    Zhang, Jinghao
    Zhao, Feng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (05) : 3957 - 3970
  • [38] Spatio-Temporal Unity Networking for Video Anomaly Detection
    Li, Yuanyuan
    Cai, Yiheng
    Liu, Jiaqi
    Lang, Shinan
    Zhang, Xinfeng
    IEEE ACCESS, 2019, 7 : 172425 - 172432
  • [39] Spatio-temporal Blotches Detection and removal in Archive Video
    Yous, H.
    Serir, A.
    2017 INTELLIGENT SYSTEMS AND COMPUTER VISION (ISCV), 2017,
  • [40] SPATIO-TEMPORAL INTERACTION FOR AERIAL VIDEO CHANGE DETECTION
    Bourdis, Nicolas
    Marraud, Denis
    Sahbi, Hichem
    2012 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2012, : 2253 - 2256