Sparse Coding on Local Spatial-Temporal Volumes for Human Action Recognition

被引:0
|
作者
Zhu, Yan [1 ]
Zhao, Xu [1 ]
Fu, Yun [2 ]
Liu, Yuncai [1 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai 200240, Peoples R China
[2] SUNY Buffalo, Dept CSE, Buffalo, NY 14260 USA
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
By extracting local spatial-temporal features from videos, many recently proposed approaches for action recognition achieve promising performance. The Bag-of-Words (BoW) model is commonly used in the approaches to obtain the video level representations. However, BoW model roughly assigns each feature vector to its closest visual word, therefore inevitably causing nontrivial quantization errors and impairing further improvements on classification rates. To obtain a more accurate and discriminative representation, in this paper, we propose an approach for action recognition by encoding local 3D spatial-temporal gradient features within the sparse coding framework. In so doing, each local spatial-temporal feature is transformed to a linear combination of a few "atoms" in a trained dictionary. In addition, we also investigate the construction of the dictionary under the guidance of transfer learning. We collect a large set of diverse video clips of sport games and movies, from which a set of universal atoms composed of the dictionary are learned by an online learning strategy. We test our approach on KTH dataset and UCF sports dataset. Experimental results demonstrate that our approach outperforms the state-of-art techniques on KTH dataset and achieves the comparable performance on UCF sports dataset.
引用
收藏
页码:660 / +
页数:3
相关论文
共 50 条
  • [31] Spatial-temporal saliency action mask attention network for action recognition
    Jiang, Min
    Pan, Na
    Kong, Jun
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2020, 71
  • [32] Robust Human Action Recognition Using Global Spatial-Temporal Attention for Human Skeleton Data
    Han, Yun
    Chung, Sheng-Luen
    Ambikapathi, ArulMurugan
    Chan, Jui-Shan
    Lin, Wei-You
    Su, Shun-Feng
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [33] Convolutional non-local spatial-temporal learning for multi-modality action recognition
    Ren, Ziliang
    Yuan, Huaqiang
    Wei, Wenhong
    Zhao, Tiezhu
    Zhang, Qieshi
    ELECTRONICS LETTERS, 2022, 58 (20) : 765 - 767
  • [34] Human action recognition based on multi-mode spatial-temporal feature fusion
    Wang, Dongli
    Yang, Jun
    Zhou, Yan
    2019 22ND INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION 2019), 2019,
  • [35] Spatial-temporal Histograms of Gradients and HOD-VLAD Encoding for Human Action Recognition
    Lin, Bo
    Fang, Bin
    2017 INTERNATIONAL CONFERENCE ON SECURITY, PATTERN ANALYSIS, AND CYBERNETICS (SPAC), 2017, : 678 - 683
  • [36] Multi-Scale Spatial-Temporal Integration Convolutional Tube for Human Action Recognition
    Wu, Haoze
    Liu, Jiawei
    Zhu, Xierong
    Wang, Meng
    Zha, Zheng-Jun
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 753 - 759
  • [37] Hierarchy Spatial-Temporal Transformer for Action Recognition in Short Videos
    Cai, Guoyong
    Cai, Yumeng
    FUZZY SYSTEMS AND DATA MINING VI, 2020, 331 : 760 - 774
  • [38] Hierarchical Spatial-Temporal Masked Contrast for Skeleton Action Recognition
    Cao, Wenming
    Zhang, Aoyu
    He, Zhihai
    Zhang, Yicha
    Yin, Xinpeng
    IEEE Transactions on Artificial Intelligence, 2024, 5 (11): : 5801 - 5814
  • [39] Multi-Branch Spatial-Temporal Network for Action Recognition
    Wang, Yingying
    Li, Wei
    Tao, Ran
    IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (10) : 1556 - 1560
  • [40] Recurrent Spatial-Temporal Attention Network for Action Recognition in Videos
    Du, Wenbin
    Wang, Yali
    Qiao, Yu
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (03) : 1347 - 1360