Spatial-Temporal Interleaved Network for Efficient Action Recognition

被引:1
|
作者
Jiang, Shengqin [1 ,2 ,3 ]
Zhang, Haokui [4 ]
Qi, Yuankai [5 ]
Liu, Qingshan [6 ]
机构
[1] Nanjing Univ Informat Sci & Technol, Sch Comp Sci, Nanjing 210044, Peoples R China
[2] Nanjing Univ Informat Sci & Technol, Minist Educ, Engn Res Ctr Digital Forens, Nanjing 210044, Peoples R China
[3] Nanjing Univ Informat Sci & Technol, Jiangsu Collaborat Innovat Ctr Atmospher Environm, Nanjing 210044, Peoples R China
[4] Northwestern Polytech Univ, Sch Comp Sci, Xian 710129, Peoples R China
[5] Macquarie Univ, Sch Comp, Sydney, NSW 2109, Australia
[6] Nanjing Univ Posts & Telecommun, Sch Comp Sci, Nanjing 210023, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Convolution; Three-dimensional displays; Kernel; Computational modeling; Videos; Transformers; Solid modeling; 3D convolution; action recognition; feature interaction; spatial-temporal features;
D O I
10.1109/TII.2024.3450021
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The decomposition of 3D convolution will considerably reduce the computing complexity of 3D convolutional neural networks, yet simple stacking restricts the performance of neural networks. To this end, we propose a spatial-temporal interleaved network for efficient action recognition. By deeply analyzing this task, it revisits the structure of 3D neural networks in action recognition from the following perspectives. To enhance the learning of robust spatial-temporal features, we initially propose an interleaved feature interaction module to comprehensively explore cross-layer features and capture the most discriminative information among them. With regards to being lightweight, a boosted parallel pseudo-3D module is introduced with the goal of circumventing a substantial number of computations from the lower to middle levels while enhancing temporal and spatial features in parallel at high levels. Furthermore, we exploit a spatial-temporal differential attention mechanism to suppress redundant features in different dimensions while reaping the benefits of nearly negligible parameters. Lastly, extensive experiments on four action recognition benchmarks are given to show the advantages and efficiency of our proposed method. Specifically, our method attains a 15.2% improvement in Top-1 accuracy compared to our baseline, a stack of full 3D convolutional layers, on the Something-Something V1 dataset while utilizing only 18.2% of the parameters.
引用
收藏
页码:178 / 187
页数:10
相关论文
共 50 条
  • [21] R-STAN: Residual Spatial-Temporal Attention Network for Action Recognition
    Liu, Quanle
    Che, Xiangjiu
    Bie, Mei
    IEEE ACCESS, 2019, 7 : 82246 - 82255
  • [22] SAST: Learning Semantic Action-Aware Spatial-Temporal Features for Efficient Action Recognition
    Wang, Fei
    Wang, Guorui
    Huang, Yunwen
    Chu, Hao
    IEEE ACCESS, 2019, 7 : 164876 - 164886
  • [23] STA-TSN: Spatial-Temporal Attention Temporal Segment Network for action recognition in video
    Yang, Guoan
    Yang, Yong
    Lu, Zhengzhi
    Yang, Junjie
    Liu, Deyang
    Zhou, Chuanbo
    Fan, Zien
    PLOS ONE, 2022, 17 (03):
  • [24] Action Recognition by Joint Spatial-Temporal Motion Feature
    Zhang, Weihua
    Zhang, Yi
    Gao, Chaobang
    Zhou, Jiliu
    JOURNAL OF APPLIED MATHEMATICS, 2013,
  • [25] Spatial-Temporal Separable Attention for Video Action Recognition
    Guo, Xi
    Hu, Yikun
    Chen, Fang
    Jin, Yuhui
    Qiao, Jian
    Huang, Jian
    Yang, Qin
    2022 INTERNATIONAL CONFERENCE ON FRONTIERS OF ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING, FAIML, 2022, : 224 - 228
  • [26] Spatial-Temporal Pyramid Graph Reasoning for Action Recognition
    Geng, Tiantian
    Zheng, Feng
    Hou, Xiaorong
    Lu, Ke
    Qi, Guo-Jun
    Shao, Ling
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5484 - 5497
  • [27] Action recognition with spatial-temporal discriminative filter banks
    Martinez, Brais
    Modolo, Davide
    Xiong, Yuanjun
    Tighe, Joseph
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5481 - 5490
  • [28] Select and Focus: Action Recognition with Spatial-Temporal Attention
    Chan, Wensong
    Tian, Zhiqiang
    Liu, Shuai
    Ren, Jing
    Lan, Xuguang
    INTELLIGENT ROBOTICS AND APPLICATIONS, ICIRA 2019, PT III, 2019, 11742 : 461 - 471
  • [29] STAN: Spatial-Temporal Awareness Network for Temporal Action Detection
    Liu, Minghao
    Liu, Haiyi
    Zhao, Sirui
    Ma, Fei
    Li, Minglei
    Dai, Zonghong
    Wang, Hao
    Xu, Tong
    Chen, Enhong
    PROCEEDINGS OF THE 6TH INTERNATIONAL WORKSHOP ON MULTIMEDIA CONTENT ANALYSIS IN SPORTS, MMSPORTS 2023, 2023, : 161 - 165
  • [30] Spatial-Temporal gated graph attention network for skeleton-based action recognition
    Rahevar, Mrugendrasinh
    Ganatra, Amit
    PATTERN ANALYSIS AND APPLICATIONS, 2023, 26 (03) : 929 - 939