STAR: Efficient SpatioTemporal Modeling for Action Recognition

被引:2
|
作者
Kumar, Abhijeet [1 ]
Abrams, Samuel [1 ]
Kumar, Abhishek [1 ]
Narayanan, Vijaykrishnan [1 ]
机构
[1] Penn State Univ, EECS Dept, State Coll, PA 16802 USA
关键词
Action recognition; Compressed domain; I-frames; Spatial-temporal 2D convolutional networks; DOMAIN;
D O I
10.1007/s00034-022-02160-x
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Action recognition in video has gained significant attention over the past several years. While conventional 2D CNNs have found great success in understanding images, they are not as effective in capturing temporal relationships present in video. By contrast, 3D CNNs capture spatiotemporal information well, but they incur a high computational cost, making deployment challenging. In video, key information is typically confined to a small number of frames, though many current approaches require decompressing and processing all frames, which wastes resources. Others work directly on the compressed domain but require multiple input streams to understand the data. In our work, we directly operate on compressed video and extract information solely from intracoded frames (I-frames) avoiding the use of motion vectors and residuals for motion information making this a single-stream network. This reduces processing time and energy consumption, by extension, making this approach more accessible for a wider range of machines and uses. Extensive testing is employed on the UCF101 (Soomro et al. in UCF101: a dataset of 101 human actions classes from videos in the Wild, 2012) and HMDB51 (Kuehne et al., in: Jhuang, Garrote, Poggio, Serre (eds) Proceedings of the international conference on computer vision (ICCV), 2011) datasets to evaluate our framework and show that computational complexity is reduced significantly while achieving competitive accuracy to existing compressed domain efforts, i.e., 92.6% top1 accuracy in UCF-101 and 62.9% in HMDB-51 dataset with 24.3M parameters and 4 GFLOPS and energy savings of over 11 x for the two datasets versus CoViAR (Wu et al. in Compressed video action recognition, 2018).
引用
收藏
页码:705 / 723
页数:19
相关论文
共 50 条
  • [31] Learning Spatiotemporal-Selected Representations in Videos for Action Recognition
    Zhang, Jiachao
    Tong, Ying
    Jiao, Liangbao
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2023, 32 (12)
  • [32] Multi-receptive field spatiotemporal network for action recognition
    Mu Nie
    Sen Yang
    Zhenhua Wang
    Baochang Zhang
    Huimin Lu
    Wankou Yang
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 2439 - 2453
  • [33] Spatiotemporal neural networks for action recognition based on joint loss
    Chao Jing
    Ping Wei
    Hongbin Sun
    Nanning Zheng
    Neural Computing and Applications, 2020, 32 : 4293 - 4302
  • [34] NON-LOCAL SPATIOTEMPORAL CORRELATION ATTENTION FOR ACTION RECOGNITION
    Ha, Manh-Hung
    Chen, Oscal Tzyh-Chiang
    2022 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (IEEE ICMEW 2022), 2022,
  • [35] SPARSE CODING-BASED SPATIOTEMPORAL SALIENCY FOR ACTION RECOGNITION
    Zhang, Tao
    Xu, Long
    Yang, Jie
    Shi, Pengfei
    Jia, Wenjing
    2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2015, : 2045 - 2049
  • [36] Multi-receptive field spatiotemporal network for action recognition
    Nie, Mu
    Yang, Sen
    Wang, Zhenhua
    Zhang, Baochang
    Lu, Huimin
    Yang, Wankou
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (07) : 2439 - 2453
  • [37] Spatiotemporal neural networks for action recognition based on joint loss
    Jing, Chao
    Wei, Ping
    Sun, Hongbin
    Zheng, Nanning
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (09): : 4293 - 4302
  • [38] Spatiotemporal attention enhanced features fusion network for action recognition
    Danfeng Zhuang
    Min Jiang
    Jun Kong
    Tianshan Liu
    International Journal of Machine Learning and Cybernetics, 2021, 12 : 823 - 841
  • [39] Learning Attention-Enhanced Spatiotemporal Representation for Action Recognition
    Shi, Zhensheng
    Cao, Liangjie
    Guan, Cheng
    Zheng, Haiyong
    Gu, Zhaorui
    Yu, Zhibin
    Zheng, Bing
    IEEE ACCESS, 2020, 8 (08): : 16785 - 16794
  • [40] Spatiotemporal attention enhanced features fusion network for action recognition
    Zhuang, Danfeng
    Jiang, Min
    Kong, Jun
    Liu, Tianshan
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2021, 12 (03) : 823 - 841