Learning Spatial and Temporal Extents of Human Actions for Action Detection

被引:38
作者
Zhou, Zhong [1 ]
Shi, Feng [1 ]
Wu, Wei [1 ]
机构
[1] Beihang Univ, State Key Lab Virtual Real Technol & Syst, Beijing 100191, Peoples R China
关键词
Action localization; action recognition; discriminative latent variable model; split-and-merge; FRAMEWORK; MODELS;
D O I
10.1109/TMM.2015.2404779
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
For the problem of action detection, most existing methods require that relevant portions of the action of interest in training videos have been manually annotated with bounding boxes. Some recent works tried to avoid tedious manual annotation, and proposed to automatically identify the relevant portions in training videos. However, these methods only concerned the identification in either spatial or temporal domain, and may get irrelevant contents from another domain. These irrelevant contents are usually undesirable in the training phase, which will lead to a degradation of the detection performance. This paper advances prior work by proposing a joint learning framework to simultaneously identify the spatial and temporal extents of the action of interest in training videos. To get pixel-level localization results, our method uses dense trajectories extracted from videos as local features to represent actions. We first present a trajectory split-and-merge algorithm to segment a video into the background and several separated foreground moving objects. In this algorithm, the inherent temporal smoothness of human actions is exploited to facilitate segmentation. Then, with the latent SVM framework on segmentation results, spatial and temporal extents of the action of interest are treated as latent variables that are inferred simultaneously with action recognition. Experiments on two challenging datasets show that action detection with our learned spatial and temporal extents is superior than state-of-the-art methods.
引用
收藏
页码:512 / 525
页数:14
相关论文
共 50 条
  • [31] Spatial-temporal interaction learning based two-stream network for action recognition
    Liu, Tianyu
    Ma, Yujun
    Yang, Wenhan
    Ji, Wanting
    Wang, Ruili
    Jiang, Ping
    INFORMATION SCIENCES, 2022, 606 : 864 - 876
  • [32] An accurate violence detection framework using unsupervised spatial-temporal action translation network
    Ehsan, Tahereh Zarrat
    Nahvi, Manoochehr
    Mohtavipour, Seyed Mehdi
    VISUAL COMPUTER, 2024, 40 (03) : 1515 - 1535
  • [33] Human action recognition based on multi-mode spatial-temporal feature fusion
    Wang, Dongli
    Yang, Jun
    Zhou, Yan
    2019 22ND INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION 2019), 2019,
  • [34] ANOMALOUS HUMAN ACTION DETECTION USING A CASCADE OF DEEP LEARNING MODELS
    Riaz, Hamza
    Uzair, Muhammad
    Ullah, Habib
    Ullah, Mohib
    PROCEEDINGS OF THE 2021 9TH EUROPEAN WORKSHOP ON VISUAL INFORMATION PROCESSING (EUVIP), 2021,
  • [35] Action Temporal-Spatial Semantic Guide for 3D Human Pose Tracking
    Yu, Jialin
    Sun, Jifeng
    PROCEEDINGS OF THE 28TH CHINESE CONTROL AND DECISION CONFERENCE (2016 CCDC), 2016, : 1940 - 1945
  • [36] Learning Human Actions by Combining Global Dynamics and Local Appearance
    Luo, Guan
    Yang, Shuang
    Tian, Guodong
    Yuan, Chunfeng
    Hu, Weiming
    Maybank, Stephen J.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (12) : 2466 - 2482
  • [37] The spatial Laplacian and temporal energy pyramid representation for human action recognition using depth sequences
    Ji, Xiaopeng
    Cheng, Jun
    Tao, Dapeng
    Wu, Xinyu
    Feng, Wei
    KNOWLEDGE-BASED SYSTEMS, 2017, 122 : 64 - 74
  • [38] Combining Handcrafted Spatio-Temporal and Deep Spatial Features for Effective Human Action Recognition
    R. Divya Rani
    C. J. Prabhakar
    Human-Centric Intelligent Systems, 2025, 5 (1): : 123 - 150
  • [39] LEARNING SILHOUETTE DYNAMICS FOR HUMAN ACTION RECOGNITION
    Luo, Guan
    Hu, Weiming
    2013 20TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2013), 2013, : 2827 - 2831
  • [40] Temporal Attention-Pyramid Pooling for Temporal Action Detection
    Gan, Ming-Gang
    Zhang, Yan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 3799 - 3810