STAR: Efficient SpatioTemporal Modeling for Action Recognition

被引:2
|
作者
Kumar, Abhijeet [1 ]
Abrams, Samuel [1 ]
Kumar, Abhishek [1 ]
Narayanan, Vijaykrishnan [1 ]
机构
[1] Penn State Univ, EECS Dept, State Coll, PA 16802 USA
关键词
Action recognition; Compressed domain; I-frames; Spatial-temporal 2D convolutional networks; DOMAIN;
D O I
10.1007/s00034-022-02160-x
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Action recognition in video has gained significant attention over the past several years. While conventional 2D CNNs have found great success in understanding images, they are not as effective in capturing temporal relationships present in video. By contrast, 3D CNNs capture spatiotemporal information well, but they incur a high computational cost, making deployment challenging. In video, key information is typically confined to a small number of frames, though many current approaches require decompressing and processing all frames, which wastes resources. Others work directly on the compressed domain but require multiple input streams to understand the data. In our work, we directly operate on compressed video and extract information solely from intracoded frames (I-frames) avoiding the use of motion vectors and residuals for motion information making this a single-stream network. This reduces processing time and energy consumption, by extension, making this approach more accessible for a wider range of machines and uses. Extensive testing is employed on the UCF101 (Soomro et al. in UCF101: a dataset of 101 human actions classes from videos in the Wild, 2012) and HMDB51 (Kuehne et al., in: Jhuang, Garrote, Poggio, Serre (eds) Proceedings of the international conference on computer vision (ICCV), 2011) datasets to evaluate our framework and show that computational complexity is reduced significantly while achieving competitive accuracy to existing compressed domain efforts, i.e., 92.6% top1 accuracy in UCF-101 and 62.9% in HMDB-51 dataset with 24.3M parameters and 4 GFLOPS and energy savings of over 11 x for the two datasets versus CoViAR (Wu et al. in Compressed video action recognition, 2018).
引用
收藏
页码:705 / 723
页数:19
相关论文
共 50 条
  • [21] Action-Stage Emphasized Spatiotemporal VLAD for Video Action Recognition
    Tu, Zhigang
    Li, Hongyan
    Zhang, Dejun
    Dauwels, Justin
    Li, Baoxin
    Yuan, Junsong
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (06) : 2799 - 2812
  • [22] EFFICIENT ACTION RECOGNITION FROM COMPRESSED DEPTH MAPS
    Miao, Jie
    Jia, Xiaoyi
    Mathew, Reji
    Xu, Xiangmin
    Taubman, David
    Qing, Chunmei
    2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 16 - 20
  • [23] Spatiotemporal Features for Action Recognition and Salient Event Detection
    Rapantzikos, Konstantinos
    Avrithis, Yannis
    Kollias, Stefanos
    COGNITIVE COMPUTATION, 2011, 3 (01) : 167 - 184
  • [24] A spatiotemporal and motion information extraction network for action recognition
    Wang, Wei
    Wang, Xianmin
    Zhou, Mingliang
    Wei, Xuekai
    Li, Jing
    Ren, Xiaojun
    Zong, Xuemei
    WIRELESS NETWORKS, 2024, 30 (06) : 5389 - 5405
  • [25] SpatioTemporal focus for skeleton-based action recognition
    Wu, Liyu
    Zhang, Can
    Zou, Yuexian
    PATTERN RECOGNITION, 2023, 136
  • [26] Spatiotemporal Features for Action Recognition and Salient Event Detection
    Konstantinos Rapantzikos
    Yannis Avrithis
    Stefanos Kollias
    Cognitive Computation, 2011, 3 : 167 - 184
  • [27] Action Spotting and Recognition Based on a Spatiotemporal Orientation Analysis
    Derpanis, Konstantinos G.
    Sizintsev, Mikhail
    Cannons, Kevin J.
    Wildes, Richard P.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (03) : 527 - 540
  • [28] Spatiotemporal Saliency Representation Learning for Video Action Recognition
    Kong, Yongqiang
    Wang, Yunhong
    Li, Annan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1515 - 1528
  • [29] A Spatiotemporal Excitation Classifier Head for Action Recognition Applications
    Dinh Nguyen
    Liu, Siying
    Sintunata, Vicky
    Wang, Yue
    Ho, Jack
    Lim, ZhaoYong
    Lee, Ryan
    Leman, Karianto
    2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024, 2024, : 59 - 62
  • [30] SkeletonCapsuleNet: An Efficient Network for Action Recognition
    Yu, Yue
    Tian, Niehao
    Chen, Xiangru
    Li, Ying
    2018 8TH INTERNATIONAL CONFERENCE ON VIRTUAL REALITY AND VISUALIZATION (ICVRV), 2018, : 74 - 77