CSST-Net: Channel Split Spatiotemporal Network for Human Action Recognition

被引:1
|
作者
Zhou, Xuan [1 ]
Ma, Jixiang [1 ]
Yi, Jianping [2 ]
机构
[1] Xian Traff Engn Inst, Sch Mech & Elect Engn, Xian 710300, Peoples R China
[2] Xian Polytech Univ, Sch Elect & Informat, Xian 710048, Peoples R China
来源
INFORMATION TECHNOLOGY AND CONTROL | 2023年 / 52卷 / 04期
关键词
Temporal reasoning; Action recognition; Spatiotemporal representation learning; Spatiotemporal fusion;
D O I
10.5755/j01.itc.52.4.33239
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Temporal reasoning is crucial for action recognition tasks. The previous works use 3D CNNs to jointly capture spatiotemporal information, but it causes a lot of computational costs as well. To improve the above problems, we propose a general channel split spatiotemporal network (CSST-Net) to achieve effective spatiotemporal feature representation learning. The CSST module consists of the grouped spatiotemporal modeling (GSTM) module and the parameter-free feature fusion (PFFF) module. The GSTM module decomposes features into spatial and temporal parts along the channel dimension in parallel, which focuses on spatial and temporal clues, respectively. Meanwhile, we utilize the combination of group-wise convolution and point-wise convolution to reduce the number of parameters in the temporal branch, thus alleviating the overfitting of 3D CNNs. Furthermore, for the problem of spatiotemporal feature fusion, the PFFF module performs the recalibration and fusion of spatial and temporal features by a soft attention mechanism, without introducing extra parameters, thus ensuring the correct network information flow effectively. Finally, extensive experiments on three benchmark databases (Sth-Sth V1, Sth-Sth V2, and Jester) indicate that the proposed CSST-Net can achieve competitive performance compared to existing methods, and significantly reduces the number of parameters and FLOPs of 3D CNNs baseline.
引用
收藏
页码:952 / 965
页数:14
相关论文
共 50 条
  • [21] Spatiotemporal squeeze-and-excitation residual multiplier network for video action recognition
    Luo H.
    Tong K.
    Tongxin Xuebao/Journal on Communications, 2019, 40 (10): : 189 - 198
  • [22] Integrating Human Parsing and Pose Network for Human Action Recognition
    Ding, Runwei
    Wen, Yuhang
    Liu, Jinfu
    Dai, Nan
    Meng, Fanyang
    Liu, Mengyuan
    ARTIFICIAL INTELLIGENCE, CICAI 2023, PT I, 2024, 14473 : 182 - 194
  • [23] LAE-Net: Light and Efficient Network for Compressed Video Action Recognition
    Guo, Jinxin
    Zhang, Jiaqiang
    Zhang, Xiaojing
    Ma, Ming
    MULTIMEDIA MODELING, MMM 2023, PT II, 2023, 13834 : 265 - 276
  • [24] SGM-Net: Skeleton-guided multimodal network for action recognition
    Li, Jianan
    Xie, Xuemei
    Pan, Qingzhe
    Cao, Yuhan
    Zhao, Zhifu
    Shi, Guangming
    PATTERN RECOGNITION, 2020, 104 (104)
  • [25] Residual Gating Fusion Network for Human Action Recognition
    Zhang, Junxuan
    Hu, Haifeng
    BIOMETRIC RECOGNITION, CCBR 2018, 2018, 10996 : 79 - 86
  • [26] FEXNet: Foreground Extraction Network for Human Action Recognition
    Shen, Zhongwei
    Wu, Xiao-Jun
    Xu, Tianyang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (05) : 3141 - 3151
  • [27] MULTI-CHANNEL CORRELATION FILTERS FOR HUMAN ACTION RECOGNITION
    Kiani, Hamed
    Sim, Terence
    Lucey, Simon
    2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 1485 - 1489
  • [28] Efficient Human Vision Inspired Action Recognition Using Adaptive Spatiotemporal Sampling
    Mac, Khoi-Nguyen C.
    Do, Minh N.
    Vo, Minh P.
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 5245 - 5256
  • [29] Learning SpatioTemporal and Motion Features in a Unified 2D Network for Action Recognition
    Wang, Mengmeng
    Xing, Jiazheng
    Su, Jing
    Chen, Jun
    Liu, Yong
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (03) : 3347 - 3362
  • [30] AE-Net:Adjoint Enhancement Network for Efficient Action Recognition in Video Understanding
    Wang, Bin
    Liu, Chunsheng
    Chang, Faliang
    Wang, Wenqian
    Li, Nanjun
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 5458 - 5468