Detecting Human Action as the Spatio-Temporal Tube of Maximum Mutual Information

被引:18
|
作者
Wang, Taiqing [1 ,2 ]
Wang, Shengjin [1 ,2 ]
Ding, Xiaoqing [1 ,2 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, Beijing 100084, Peoples R China
[2] Tsinghua Natl Lab Informat Sci & Technol, Beijing 100084, Peoples R China
基金
国家高技术研究发展计划(863计划); 中国国家自然科学基金;
关键词
Action detection; feature trajectory; mutual information; spatio-temporal cuboid (ST-cuboid); spatio-temporal tube (ST-tube); RECOGNITION; MOTION; DENSE;
D O I
10.1109/TCSVT.2013.2276856
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Human action detection in complex scenes is a challenging problem due to its high-dimensional search space and dynamic backgrounds. To achieve efficient and accurate action detection, we represent a video sequence as a collection of feature trajectories and model human action as the spatio-temporal tube (ST-tube) of maximum mutual information. First, a random forest is built to evaluate the mutual information of feature trajectories toward the action class, and then a one-order Markov model is introduced to recursively infer the action regions at consecutive frames. By exploring the time-continuity property of feature trajectories, the action region is efficiently inferred at large temporal intervals. Finally, we obtain an ST-tube by concatenating the consecutive action regions bounding the human bodies. Compared with the popular spatio-temporal cuboid action model, the proposed ST-tube model is not only more efficient, but also more accurate in action localization. Experimental results on the KTH, CMU and UCF sports datasets validate the superiority of our approach over the state-of-the-art methods in both localization accuracy and time efficiency.
引用
收藏
页码:277 / 290
页数:14
相关论文
共 50 条
  • [31] Learning hierarchical spatio-temporal pattern for human activity prediction
    Ding, Wenwen
    Liu, Kai
    Cheng, Fei
    Zhang, Jin
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2016, 35 : 103 - 111
  • [32] Video action detection by learning graph-based spatio-temporal interactions
    Tomei, Matteo
    Baraldi, Lorenzo
    Calderara, Simone
    Bronzin, Simone
    Cucchiara, Rita
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2021, 206
  • [33] Learning motion representation for real-time spatio-temporal action localization
    Zhang, Dejun
    He, Linchao
    Tu, Zhigang
    Zhang, Shifu
    Han, Fei
    Yang, Boxiong
    PATTERN RECOGNITION, 2020, 103
  • [34] Spatio-Temporal Attention-Based LSTM Networks for 3D Action Recognition and Detection
    Song, Sijie
    Lan, Cuiling
    Xing, Junliang
    Zeng, Wenjun
    Liu, Jiaying
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (07) : 3459 - 3471
  • [35] Action Recognition Based on Efficient Deep Feature Learning in the Spatio-Temporal Domain
    Husain, Farzad
    Dellen, Babette
    Torras, Carme
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2016, 1 (02): : 984 - 991
  • [36] SURF-based Spatio-Temporal History Image Method for Action Representation
    Ahad, Md. Atiqur Rahman
    Tan, J. K.
    Kim, H.
    Ishikawa, S.
    2011 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY (ICIT), 2011,
  • [37] Spatio-Temporal Detection of Fine-Grained Dyadic Human Interactions
    van Gemeren, Coert
    Poppe, Ronald
    Veltkamp, Remco C.
    HUMAN BEHAVIOR UNDERSTANDING, 2016, 9997 : 116 - 133
  • [38] A Tracking-Based Two-Stage Framework for Spatio-Temporal Action Detection
    Luo, Jing
    Yang, Yulin
    Liu, Rongkai
    Chen, Li
    Fei, Hongxiao
    Hu, Chao
    Shi, Ronghua
    Zou, You
    ELECTRONICS, 2024, 13 (03)
  • [39] Spatio-Temporal Action Detection in Untrimmed Videos by Using Multimodal Features and Region Proposals
    Song, Yeongtaek
    Kim, Incheol
    SENSORS, 2019, 19 (05)
  • [40] Spatio-temporal structure of human motion primitives and its application to motion prediction
    Takano, Wataru
    Imagawa, Hirotaka
    Nakamura, Yoshihiko
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2016, 75 : 288 - 296