Detecting Human Action as the Spatio-Temporal Tube of Maximum Mutual Information

被引:18
|
作者
Wang, Taiqing [1 ,2 ]
Wang, Shengjin [1 ,2 ]
Ding, Xiaoqing [1 ,2 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, Beijing 100084, Peoples R China
[2] Tsinghua Natl Lab Informat Sci & Technol, Beijing 100084, Peoples R China
基金
中国国家自然科学基金; 国家高技术研究发展计划(863计划);
关键词
Action detection; feature trajectory; mutual information; spatio-temporal cuboid (ST-cuboid); spatio-temporal tube (ST-tube); RECOGNITION; MOTION; DENSE;
D O I
10.1109/TCSVT.2013.2276856
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Human action detection in complex scenes is a challenging problem due to its high-dimensional search space and dynamic backgrounds. To achieve efficient and accurate action detection, we represent a video sequence as a collection of feature trajectories and model human action as the spatio-temporal tube (ST-tube) of maximum mutual information. First, a random forest is built to evaluate the mutual information of feature trajectories toward the action class, and then a one-order Markov model is introduced to recursively infer the action regions at consecutive frames. By exploring the time-continuity property of feature trajectories, the action region is efficiently inferred at large temporal intervals. Finally, we obtain an ST-tube by concatenating the consecutive action regions bounding the human bodies. Compared with the popular spatio-temporal cuboid action model, the proposed ST-tube model is not only more efficient, but also more accurate in action localization. Experimental results on the KTH, CMU and UCF sports datasets validate the superiority of our approach over the state-of-the-art methods in both localization accuracy and time efficiency.
引用
收藏
页码:277 / 290
页数:14
相关论文
共 50 条
  • [21] Detecting spatio-temporal hotspots of scarlet fever in Taiwan with spatio-temporal Gi* statistic
    Tang, Jia-Hong
    Tseng, Tzu-Jung
    Chan, Ta-Chien
    PLOS ONE, 2019, 14 (04):
  • [22] Model term selection for spatio-temporal system identification using mutual information
    Wang, Shu
    Wei, Hua-Liang
    Coca, Daniel
    Billings, Stephen A.
    INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2013, 44 (02) : 223 - 231
  • [23] Spatio-Temporal VLAD Encoding for Human Action Recognition in Videos
    Duta, Ionut C.
    Ionescu, Bogdan
    Aizawa, Kiyoharu
    Sebe, Nicu
    MULTIMEDIA MODELING (MMM 2017), PT I, 2017, 10132 : 365 - 378
  • [24] Hierarchical and Spatio-Temporal Sparse Representation for Human Action Recognition
    Tian, Yi
    Kong, Yu
    Ruan, Qiuqi
    An, Gaoyun
    Fu, Yun
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (04) : 1748 - 1762
  • [25] Human Action Recognition Based on a Spatio-Temporal Video Autoencoder
    Sousa e Santos, Anderson Carlos
    Pedrini, Helio
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2020, 34 (11)
  • [26] Spatio-temporal SIFT and Its Application to Human Action Classification
    Al Ghamdi, Manal
    Zhang, Lei
    Gotoh, Yoshihiko
    COMPUTER VISION - ECCV 2012: WORKSHOPS AND DEMONSTRATIONS, PT I, 2012, 7583 : 301 - 310
  • [27] Bag of Spatio-temporal Synonym Sets for Human Action Recognition
    Pang, Lin
    Cao, Juan
    Guo, Junbo
    Lin, Shouxun
    Song, Yan
    ADVANCES IN MULTIMEDIA MODELING, PROCEEDINGS, 2010, 5916 : 422 - 432
  • [28] SPATIO-TEMPORAL PYRAMIDAL ACCORDION REPRESENTATION FOR HUMAN ACTION RECOGNITION
    Sekma, Manel
    Mejdoub, Mahmoud
    Ben Amar, Chokri
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [29] Transform based spatio-temporal descriptors for human action recognition
    Shao, Ling
    Gao, Ruoyun
    Liu, Yan
    Zhang, Hui
    NEUROCOMPUTING, 2011, 74 (06) : 962 - 973
  • [30] Action recognition based on spatio-temporal information and nonnegative component representation
    Wang J.
    Zhang X.
    Zhang P.
    Jiang L.
    Luo L.
    Dongnan Daxue Xuebao (Ziran Kexue Ban)/Journal of Southeast University (Natural Science Edition), 2016, 46 (04): : 675 - 680