CSST-Net: Channel Split Spatiotemporal Network for Human Action Recognition

被引:1
|
作者
Zhou, Xuan [1 ]
Ma, Jixiang [1 ]
Yi, Jianping [2 ]
机构
[1] Xian Traff Engn Inst, Sch Mech & Elect Engn, Xian 710300, Peoples R China
[2] Xian Polytech Univ, Sch Elect & Informat, Xian 710048, Peoples R China
来源
INFORMATION TECHNOLOGY AND CONTROL | 2023年 / 52卷 / 04期
关键词
Temporal reasoning; Action recognition; Spatiotemporal representation learning; Spatiotemporal fusion;
D O I
10.5755/j01.itc.52.4.33239
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Temporal reasoning is crucial for action recognition tasks. The previous works use 3D CNNs to jointly capture spatiotemporal information, but it causes a lot of computational costs as well. To improve the above problems, we propose a general channel split spatiotemporal network (CSST-Net) to achieve effective spatiotemporal feature representation learning. The CSST module consists of the grouped spatiotemporal modeling (GSTM) module and the parameter-free feature fusion (PFFF) module. The GSTM module decomposes features into spatial and temporal parts along the channel dimension in parallel, which focuses on spatial and temporal clues, respectively. Meanwhile, we utilize the combination of group-wise convolution and point-wise convolution to reduce the number of parameters in the temporal branch, thus alleviating the overfitting of 3D CNNs. Furthermore, for the problem of spatiotemporal feature fusion, the PFFF module performs the recalibration and fusion of spatial and temporal features by a soft attention mechanism, without introducing extra parameters, thus ensuring the correct network information flow effectively. Finally, extensive experiments on three benchmark databases (Sth-Sth V1, Sth-Sth V2, and Jester) indicate that the proposed CSST-Net can achieve competitive performance compared to existing methods, and significantly reduces the number of parameters and FLOPs of 3D CNNs baseline.
引用
收藏
页码:952 / 965
页数:14
相关论文
共 50 条
  • [41] An embedded network system for human action recognition based on compound moments
    Zhang, X. (403806370@qq.com), 1600, Binary Information Press, P.O. Box 162, Bethel, CT 06801-0162, United States (09): : 9689 - 9697
  • [42] Improving Human Action Recognition through Hierarchical Neural Network Classifiers
    Zhdanov, Pavel
    Khan, Adil
    Rivera, Adin Ramirez
    Khattak, Asad Masood
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [43] A multi-channel episodic memory model for human action learning and recognition
    Kato, Kunpei
    Chin, Wei Hong
    Toda, Yuichiro
    Kubota, Naoyuki
    2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2018, : 843 - 849
  • [44] HUMAN-OBJECT RELATION NETWORK FOR ACTION RECOGNITION IN STILL IMAGES
    Ma, Wentao
    Liang, Shuang
    2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
  • [45] Action recognition model based on the spatiotemporal sampling graph convolutional network and self-calibration mechanism
    Cao Y.
    Wu W.
    Zhang X.
    Xia Y.
    Gao Q.
    Gongcheng Kexue Xuebao/Chinese Journal of Engineering, 2024, 46 (03): : 480 - 490
  • [46] Temporal-Channel Topology Enhanced Network for Skeleton-Based Action Recognition
    Luo, Jinzhao
    Zhou, Lu
    Zhu, Guibo
    Ge, Guojing
    Yang, Beiying
    Wang, Jinqiao
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 109 - 119
  • [47] SCNN: SEQUENTIAL CONVOLUTIONAL NEURAL NETWORK FOR HUMAN ACTION RECOGNITION IN VIDEOS
    Yang, Hao
    Yuan, Chunfeng
    Xing, Junliang
    Hu, Weiming
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 355 - 359
  • [48] Local Feature Fusion Temporal Convolutional Network for Human Action Recognition
    Song Z.
    Zhou Y.
    Jia J.
    Xin S.
    Liu Y.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2020, 32 (03): : 418 - 424
  • [49] A Spatiotemporal Motion Variation Features Extraction Approach for Human Tracking and Pose-based Action Recognition
    Jalal, Ahmad
    Kamal, Shaharyar
    Farooq, Adnan
    Kim, Daijin
    2015 4TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION ICIEV 15, 2015,
  • [50] Adaptive spatiotemporal graph convolutional network with intermediate aggregation of multi-stream skeleton features for action recognition
    Zhao, Yukai
    Wang, Jingwei
    Wang, Han
    Liu, Min
    Ma, Yunlong
    NEUROCOMPUTING, 2022, 505 : 116 - 124