Detecting Human Action as the Spatio-Temporal Tube of Maximum Mutual Information

被引：18

作者：

Wang, Taiqing ^{[1
,2
]}

Wang, Shengjin ^{[1
,2
]}

Ding, Xiaoqing ^{[1
,2
]}

机构：

[1] Tsinghua Univ, Dept Elect Engn, Beijing 100084, Peoples R China

[2] Tsinghua Natl Lab Informat Sci & Technol, Beijing 100084, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2014年 / 24卷 / 02期

基金：

国家高技术研究发展计划(863计划); 中国国家自然科学基金;

关键词：

Action detection; feature trajectory; mutual information; spatio-temporal cuboid (ST-cuboid); spatio-temporal tube (ST-tube); RECOGNITION; MOTION; DENSE;

D O I：

10.1109/TCSVT.2013.2276856

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Human action detection in complex scenes is a challenging problem due to its high-dimensional search space and dynamic backgrounds. To achieve efficient and accurate action detection, we represent a video sequence as a collection of feature trajectories and model human action as the spatio-temporal tube (ST-tube) of maximum mutual information. First, a random forest is built to evaluate the mutual information of feature trajectories toward the action class, and then a one-order Markov model is introduced to recursively infer the action regions at consecutive frames. By exploring the time-continuity property of feature trajectories, the action region is efficiently inferred at large temporal intervals. Finally, we obtain an ST-tube by concatenating the consecutive action regions bounding the human bodies. Compared with the popular spatio-temporal cuboid action model, the proposed ST-tube model is not only more efficient, but also more accurate in action localization. Experimental results on the KTH, CMU and UCF sports datasets validate the superiority of our approach over the state-of-the-art methods in both localization accuracy and time efficiency.

引用

页码：277 / 290

页数：14

共 50 条

[31] Learning hierarchical spatio-temporal pattern for human activity prediction
Ding, Wenwen
Liu, Kai
Cheng, Fei
Zhang, Jin
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2016, 35 : 103 - 111
[32] Video action detection by learning graph-based spatio-temporal interactions
Tomei, Matteo
Baraldi, Lorenzo
Calderara, Simone
Bronzin, Simone
Cucchiara, Rita
COMPUTER VISION AND IMAGE UNDERSTANDING, 2021, 206
[33] Learning motion representation for real-time spatio-temporal action localization
Zhang, Dejun
He, Linchao
Tu, Zhigang
Zhang, Shifu
Han, Fei
Yang, Boxiong
PATTERN RECOGNITION, 2020, 103
[34] Spatio-Temporal Attention-Based LSTM Networks for 3D Action Recognition and Detection
Song, Sijie
Lan, Cuiling
Xing, Junliang
Zeng, Wenjun
Liu, Jiaying
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (07) : 3459 - 3471
[35] Action Recognition Based on Efficient Deep Feature Learning in the Spatio-Temporal Domain
Husain, Farzad
Dellen, Babette
Torras, Carme
IEEE ROBOTICS AND AUTOMATION LETTERS, 2016, 1 (02): : 984 - 991
[36] SURF-based Spatio-Temporal History Image Method for Action Representation
Ahad, Md. Atiqur Rahman
Tan, J. K.
Kim, H.
Ishikawa, S.
2011 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY (ICIT), 2011,
[37] Spatio-Temporal Detection of Fine-Grained Dyadic Human Interactions
van Gemeren, Coert
Poppe, Ronald
Veltkamp, Remco C.
HUMAN BEHAVIOR UNDERSTANDING, 2016, 9997 : 116 - 133
[38] A Tracking-Based Two-Stage Framework for Spatio-Temporal Action Detection
Luo, Jing
Yang, Yulin
Liu, Rongkai
Chen, Li
Fei, Hongxiao
Hu, Chao
Shi, Ronghua
Zou, You
ELECTRONICS, 2024, 13 (03)
[39] Spatio-Temporal Action Detection in Untrimmed Videos by Using Multimodal Features and Region Proposals
Song, Yeongtaek
Kim, Incheol
SENSORS, 2019, 19 (05)
[40] Spatio-temporal structure of human motion primitives and its application to motion prediction
Takano, Wataru
Imagawa, Hirotaka
Nakamura, Yoshihiko
ROBOTICS AND AUTONOMOUS SYSTEMS, 2016, 75 : 288 - 296

← 1 2 3 4 5 →