Learning Spatial and Temporal Extents of Human Actions for Action Detection

被引：38

作者：

Zhou, Zhong ^{[1
]}

Shi, Feng ^{[1
]}

Wu, Wei ^{[1
]}

机构：

[1] Beihang Univ, State Key Lab Virtual Real Technol & Syst, Beijing 100191, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2015年 / 17卷 / 04期

关键词：

Action localization; action recognition; discriminative latent variable model; split-and-merge; FRAMEWORK; MODELS;

D O I：

10.1109/TMM.2015.2404779

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

For the problem of action detection, most existing methods require that relevant portions of the action of interest in training videos have been manually annotated with bounding boxes. Some recent works tried to avoid tedious manual annotation, and proposed to automatically identify the relevant portions in training videos. However, these methods only concerned the identification in either spatial or temporal domain, and may get irrelevant contents from another domain. These irrelevant contents are usually undesirable in the training phase, which will lead to a degradation of the detection performance. This paper advances prior work by proposing a joint learning framework to simultaneously identify the spatial and temporal extents of the action of interest in training videos. To get pixel-level localization results, our method uses dense trajectories extracted from videos as local features to represent actions. We first present a trajectory split-and-merge algorithm to segment a video into the background and several separated foreground moving objects. In this algorithm, the inherent temporal smoothness of human actions is exploited to facilitate segmentation. Then, with the latent SVM framework on segmentation results, spatial and temporal extents of the action of interest are treated as latent variables that are inferred simultaneously with action recognition. Experiments on two challenging datasets show that action detection with our learned spatial and temporal extents is superior than state-of-the-art methods.

引用

页码：512 / 525

页数：14

共 50 条

[41] An attentional spatial temporal graph convolutional network with co-occurrence feature learning for action recognition
Dong Tian
Zhe-Ming Lu
Xiao Chen
Long-Hua Ma
Multimedia Tools and Applications, 2020, 79 : 12679 - 12697
[42] An efficient action proposal processing approach for temporal action detection
Hu, Xuejiao
Dai, Jingzhao
Li, Ming
Li, Yang
Du, Sidan
NEUROCOMPUTING, 2025, 623
[43] Rotation-based spatial-temporal feature learning from skeleton sequences for action recognition
Liu, Xing
Li, Yanshan
Xia, Rongjie
SIGNAL IMAGE AND VIDEO PROCESSING, 2020, 14 (06) : 1227 - 1234
[44] A Spatio-Temporal Deep Learning Approach For Human Action Recognition in Infrared Videos
Shah, Anuj K.
Ghosh, Ripul
Akula, Aparna
OPTICS AND PHOTONICS FOR INFORMATION PROCESSING XII, 2018, 10751
[45] An attentional spatial temporal graph convolutional network with co-occurrence feature learning for action recognition
Tian, Dong
Lu, Zhe-Ming
Chen, Xiao
Ma, Long-Hua
MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (17-18) : 12679 - 12697
[46] Joint spatial-temporal attention for action recognition
Yu, Tingzhao
Guo, Chaoxu
Wang, Lingfeng
Gu, Huxiang
Xiang, Shiming
Pan, Chunhong
PATTERN RECOGNITION LETTERS, 2018, 112 : 226 - 233
[47] CASCADED TEMPORAL SPATIAL FEATURES FOR VIDEO ACTION RECOGNITION
Yu, Tingzhao
Gu, Huxiang
Wang, Lingfeng
Xiang, Shiming
Pan, Chunhong
2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 1552 - 1556
[48] Mining Spatial Temporal Saliency Structure for Action Recognition
Liu, Yinan
Wu, Qingbo
Xu, Linfeng
Wu, Bo
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (10): : 2643 - 2646
[49] Spatial-temporal interaction module for action recognition
Luo, Hui-Lan
Chen, Han
Cheung, Yiu-Ming
Yu, Yawei
JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (04)
[50] Temporal and spatial constraints of action effect on sensory binding
Corveleyn, Xavier
Lopez-Moliner, Joan
Coello, Yann
EXPERIMENTAL BRAIN RESEARCH, 2015, 233 (12) : 3379 - 3392

← 1 2 3 4 5 →