Learning Spatial and Temporal Extents of Human Actions for Action Detection

被引:38
作者
Zhou, Zhong [1 ]
Shi, Feng [1 ]
Wu, Wei [1 ]
机构
[1] Beihang Univ, State Key Lab Virtual Real Technol & Syst, Beijing 100191, Peoples R China
关键词
Action localization; action recognition; discriminative latent variable model; split-and-merge; FRAMEWORK; MODELS;
D O I
10.1109/TMM.2015.2404779
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
For the problem of action detection, most existing methods require that relevant portions of the action of interest in training videos have been manually annotated with bounding boxes. Some recent works tried to avoid tedious manual annotation, and proposed to automatically identify the relevant portions in training videos. However, these methods only concerned the identification in either spatial or temporal domain, and may get irrelevant contents from another domain. These irrelevant contents are usually undesirable in the training phase, which will lead to a degradation of the detection performance. This paper advances prior work by proposing a joint learning framework to simultaneously identify the spatial and temporal extents of the action of interest in training videos. To get pixel-level localization results, our method uses dense trajectories extracted from videos as local features to represent actions. We first present a trajectory split-and-merge algorithm to segment a video into the background and several separated foreground moving objects. In this algorithm, the inherent temporal smoothness of human actions is exploited to facilitate segmentation. Then, with the latent SVM framework on segmentation results, spatial and temporal extents of the action of interest are treated as latent variables that are inferred simultaneously with action recognition. Experiments on two challenging datasets show that action detection with our learned spatial and temporal extents is superior than state-of-the-art methods.
引用
收藏
页码:512 / 525
页数:14
相关论文
共 50 条
  • [41] An attentional spatial temporal graph convolutional network with co-occurrence feature learning for action recognition
    Dong Tian
    Zhe-Ming Lu
    Xiao Chen
    Long-Hua Ma
    Multimedia Tools and Applications, 2020, 79 : 12679 - 12697
  • [42] An efficient action proposal processing approach for temporal action detection
    Hu, Xuejiao
    Dai, Jingzhao
    Li, Ming
    Li, Yang
    Du, Sidan
    NEUROCOMPUTING, 2025, 623
  • [43] Rotation-based spatial-temporal feature learning from skeleton sequences for action recognition
    Liu, Xing
    Li, Yanshan
    Xia, Rongjie
    SIGNAL IMAGE AND VIDEO PROCESSING, 2020, 14 (06) : 1227 - 1234
  • [44] A Spatio-Temporal Deep Learning Approach For Human Action Recognition in Infrared Videos
    Shah, Anuj K.
    Ghosh, Ripul
    Akula, Aparna
    OPTICS AND PHOTONICS FOR INFORMATION PROCESSING XII, 2018, 10751
  • [45] An attentional spatial temporal graph convolutional network with co-occurrence feature learning for action recognition
    Tian, Dong
    Lu, Zhe-Ming
    Chen, Xiao
    Ma, Long-Hua
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (17-18) : 12679 - 12697
  • [46] Joint spatial-temporal attention for action recognition
    Yu, Tingzhao
    Guo, Chaoxu
    Wang, Lingfeng
    Gu, Huxiang
    Xiang, Shiming
    Pan, Chunhong
    PATTERN RECOGNITION LETTERS, 2018, 112 : 226 - 233
  • [47] CASCADED TEMPORAL SPATIAL FEATURES FOR VIDEO ACTION RECOGNITION
    Yu, Tingzhao
    Gu, Huxiang
    Wang, Lingfeng
    Xiang, Shiming
    Pan, Chunhong
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 1552 - 1556
  • [48] Mining Spatial Temporal Saliency Structure for Action Recognition
    Liu, Yinan
    Wu, Qingbo
    Xu, Linfeng
    Wu, Bo
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (10): : 2643 - 2646
  • [49] Spatial-temporal interaction module for action recognition
    Luo, Hui-Lan
    Chen, Han
    Cheung, Yiu-Ming
    Yu, Yawei
    JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (04)
  • [50] Temporal and spatial constraints of action effect on sensory binding
    Corveleyn, Xavier
    Lopez-Moliner, Joan
    Coello, Yann
    EXPERIMENTAL BRAIN RESEARCH, 2015, 233 (12) : 3379 - 3392