Learning Spatial and Temporal Extents of Human Actions for Action Detection

被引：38

作者：

Zhou, Zhong ^{[1
]}

Shi, Feng ^{[1
]}

Wu, Wei ^{[1
]}

机构：

[1] Beihang Univ, State Key Lab Virtual Real Technol & Syst, Beijing 100191, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2015年 / 17卷 / 04期

关键词：

Action localization; action recognition; discriminative latent variable model; split-and-merge; FRAMEWORK; MODELS;

D O I：

10.1109/TMM.2015.2404779

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

For the problem of action detection, most existing methods require that relevant portions of the action of interest in training videos have been manually annotated with bounding boxes. Some recent works tried to avoid tedious manual annotation, and proposed to automatically identify the relevant portions in training videos. However, these methods only concerned the identification in either spatial or temporal domain, and may get irrelevant contents from another domain. These irrelevant contents are usually undesirable in the training phase, which will lead to a degradation of the detection performance. This paper advances prior work by proposing a joint learning framework to simultaneously identify the spatial and temporal extents of the action of interest in training videos. To get pixel-level localization results, our method uses dense trajectories extracted from videos as local features to represent actions. We first present a trajectory split-and-merge algorithm to segment a video into the background and several separated foreground moving objects. In this algorithm, the inherent temporal smoothness of human actions is exploited to facilitate segmentation. Then, with the latent SVM framework on segmentation results, spatial and temporal extents of the action of interest are treated as latent variables that are inferred simultaneously with action recognition. Experiments on two challenging datasets show that action detection with our learned spatial and temporal extents is superior than state-of-the-art methods.

引用

页码：512 / 525

页数：14

共 50 条

[31] Spatial-temporal interaction learning based two-stream network for action recognition
Liu, Tianyu
Ma, Yujun
Yang, Wenhan
Ji, Wanting
Wang, Ruili
Jiang, Ping
INFORMATION SCIENCES, 2022, 606 : 864 - 876
[32] An accurate violence detection framework using unsupervised spatial-temporal action translation network
Ehsan, Tahereh Zarrat
Nahvi, Manoochehr
Mohtavipour, Seyed Mehdi
VISUAL COMPUTER, 2024, 40 (03) : 1515 - 1535
[33] Human action recognition based on multi-mode spatial-temporal feature fusion
Wang, Dongli
Yang, Jun
Zhou, Yan
2019 22ND INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION 2019), 2019,
[34] ANOMALOUS HUMAN ACTION DETECTION USING A CASCADE OF DEEP LEARNING MODELS
Riaz, Hamza
Uzair, Muhammad
Ullah, Habib
Ullah, Mohib
PROCEEDINGS OF THE 2021 9TH EUROPEAN WORKSHOP ON VISUAL INFORMATION PROCESSING (EUVIP), 2021,
[35] Action Temporal-Spatial Semantic Guide for 3D Human Pose Tracking
Yu, Jialin
Sun, Jifeng
PROCEEDINGS OF THE 28TH CHINESE CONTROL AND DECISION CONFERENCE (2016 CCDC), 2016, : 1940 - 1945
[36] Learning Human Actions by Combining Global Dynamics and Local Appearance
Luo, Guan
Yang, Shuang
Tian, Guodong
Yuan, Chunfeng
Hu, Weiming
Maybank, Stephen J.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (12) : 2466 - 2482
[37] The spatial Laplacian and temporal energy pyramid representation for human action recognition using depth sequences
Ji, Xiaopeng
Cheng, Jun
Tao, Dapeng
Wu, Xinyu
Feng, Wei
KNOWLEDGE-BASED SYSTEMS, 2017, 122 : 64 - 74
[38] Combining Handcrafted Spatio-Temporal and Deep Spatial Features for Effective Human Action Recognition
R. Divya Rani
C. J. Prabhakar
Human-Centric Intelligent Systems, 2025, 5 (1): : 123 - 150
[39] LEARNING SILHOUETTE DYNAMICS FOR HUMAN ACTION RECOGNITION
Luo, Guan
Hu, Weiming
2013 20TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2013), 2013, : 2827 - 2831
[40] Temporal Attention-Pyramid Pooling for Temporal Action Detection
Gan, Ming-Gang
Zhang, Yan
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 3799 - 3810

← 1 2 3 4 5 →