Detecting action tubes via spatial action estimation and temporal path inference

被引:4
|
作者
Li, Nannan [1 ]
Huang, Jingjia [1 ]
Li, Thomas [2 ]
Guo, Huiwen [3 ]
Li, Ge [1 ]
机构
[1] Peking Univ, Shenzhen Grad Sch, Sch Elect & Comp Engn, Beijing, Peoples R China
[2] Gpower Semicond Inc, Suzhou, Peoples R China
[3] Chinese Acad Sci, Shenzhen Inst Adv Technol, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Deep learning; Action detection; Spatial localization; Region proposal network; Tracking-by-detection; SUM-PRODUCT NETWORKS; ACTION RECOGNITION;
D O I
10.1016/j.neucom.2018.05.033
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we address the problem of action detection in unconstrained video clips. Our approach starts from action detection on object proposals at each frame, then aggregates the frame-level detection results belonging to the same actor across the whole video via linking, associating, and tracking to generate action tubes that are spatially compact and temporally continuous. To achieve the target, a novel action detection model with two-stream architecture is firstly proposed, which utilizes the fused feature from both appearance and motion cues and can be trained end-to-end. Then, the association of the action paths is formulated as a maximum set coverage problem with the results of action detection as a priori. We utilize an incremental search algorithm to obtain all the action proposals at one-pass operation with great efficiency, especially while dealing with the video of long duration or with multiple action instances. Finally, a tracking-by-detection scheme is designed to further refine the generated action paths. Extensive experiments on three validation datasets, UCF-Sports, UCF-101 and J-HMDB, show that the proposed approach advances state-of-the-art action detection performance in terms of both accuracy and proposal quality. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:65 / 77
页数:13
相关论文
共 50 条
  • [1] Online temporal classification of human action using action inference graph
    Elahi, G. M. Mashrur E.
    Yang, Yee-Hong
    PATTERN RECOGNITION, 2022, 132
  • [2] HUMAN ACTION RECOGNITION VIA SPATIAL AND TEMPORAL METHODS
    Eroglu, Hulusi
    Gokce, C. Onur
    Ilk, H. Gokhan
    2014 22ND SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2014, : 104 - 107
  • [3] Temporal-Spatial Mapping for Action Recognition
    Song, Xiaolin
    Lan, Cuiling
    Zeng, Wenjun
    Xing, Junliang
    Sun, Xiaoyan
    Yang, Jingyu
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (03) : 748 - 759
  • [4] Spatial-temporal interaction module for action recognition
    Luo, Hui-Lan
    Chen, Han
    Cheung, Yiu-Ming
    Yu, Yawei
    JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (04)
  • [5] Temporal Structure Learning with Grenander Inference for Action Recognition
    Wu K.-W.
    Gao T.
    Xie Z.
    Guo W.-B.
    Ruan Jian Xue Bao/Journal of Software, 2022, 33 (05): : 1865 - 1879
  • [6] Skeleton-based action recognition via spatial and temporal transformer networks
    Plizzari, Chiara
    Cannici, Marco
    Matteucci, Matteo
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2021, 208 (208-209)
  • [7] Spatial-Temporal Attention for Action Recognition
    Sun, Dengdi
    Wu, Hanqing
    Ding, Zhuanlian
    Luo, Bin
    Tang, Jin
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT I, 2018, 11164 : 854 - 864
  • [8] Action recognition and localization with spatial and temporal contexts
    Xu, Wanru
    Miao, Zhenjiang
    Yu, Jian
    Ji, Qiang
    NEUROCOMPUTING, 2019, 333 : 351 - 363
  • [9] Spatial-temporal saliency action mask attention network for action recognition
    Jiang, Min
    Pan, Na
    Kong, Jun
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2020, 71
  • [10] Efficient Video Transformers via Spatial-temporal Token Merging for Action Recognition
    Feng, Zhanzhou
    Xu, Jiaming
    Ma, Lei
    Zhang, Shiliang
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (04)