Temporal interaction and excitation for action recognition

被引:0
作者
Wang, Chenwu [1 ,2 ]
Yang, Linfeng [3 ]
Zhu, Zhixiang [1 ]
Wang, Pei [1 ]
Nan, Ajian [4 ]
机构
[1] Xian Univ Posts & Telecommun, Sch Modern Post, Xian, Peoples R China
[2] Northwestern Polytech Univ, Sch Comp Sci, Xian, Peoples R China
[3] Xian Univ Posts & Telecommun, Sch Comp Sci & Technol, Xian, Peoples R China
[4] Shaanxi Informat Engn Res Inst, Data Operat Dept, Xian, Peoples R China
关键词
action recognition; temporal interaction fusion; motion excitation;
D O I
10.1117/1.JEI.32.4.043028
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Two-stream networks have been widely used in action recognition by integrating the appearance information from RGB frames with the motion-rich optical flow data, resulting in impressive recognition accuracy. However, the drawbacks of two-stream networks using 2D convolutional neural network (CNN) are apparent. The computation of optical flow is resource intensive and time consuming, and 2D CNN cannot model temporal information. To alleviate these problems, we propose a temporal interaction and excitation (TIE) module that can be embedded into existing 2D CNN in a plug-and-play manner. It comprises two components: the temporal interaction fusion (TIF) module and the motion excitation (ME) module. The TIF module employs channel-wise temporal convolution to adaptively fuse information from adjacent video frames, facilitating information exchange between the adjacent frames while maintaining its spatial feature learning capability. The ME module is a lightweight motion extraction module that leverages the differences between adjacent frames to model feature-level motion information, replacing the need for traditional optical flow. This enhancement helps the model better understand human actions in videos. The TIE module is designed to improve the network performance while adding minimal computational cost. Experimental results demonstrate that our proposed method adds only a few parameters and computational costs while achieving competitive results on HMDB-51 and UCF-101 datasets. (c) 2023 SPIE and IS&T
引用
收藏
页数:11
相关论文
共 49 条
  • [1] Dynamic Image Networks for Action Recognition
    Bilen, Hakan
    Fernando, Basura
    Gavves, Efstratios
    Vedaldi, Andrea
    Gould, Stephen
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3034 - 3042
  • [2] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
    Carreira, Joao
    Zisserman, Andrew
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
  • [3] Video-based action recognition using spurious-3D residual attention networks
    Chen, Bo
    Tang, Hongying
    Zhang, Zebin
    Tong, Guanjun
    Li, Baoqing
    [J]. IET IMAGE PROCESSING, 2022, 16 (11) : 3097 - 3111
  • [4] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
  • [5] Learning Spatiotemporal Features with 3D Convolutional Networks
    Du Tran
    Bourdev, Lubomir
    Fergus, Rob
    Torresani, Lorenzo
    Paluri, Manohar
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497
  • [6] Feichtenhofer C, 2016, ADV NEUR IN, V29
  • [7] SlowFast Networks for Video Recognition
    Feichtenhofer, Christoph
    Fan, Haoqi
    Malik, Jitendra
    He, Kaiming
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6201 - 6210
  • [8] Spatiotemporal Multiplier Networks for Video Action Recognition
    Feichtenhofer, Christoph
    Pinz, Axel
    Wildes, Richard P.
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 7445 - 7454
  • [9] Convolutional Two-Stream Network Fusion for Video Action Recognition
    Feichtenhofer, Christoph
    Pinz, Axel
    Zisserman, Andrew
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 1933 - 1941
  • [10] Gowda SN, 2021, AAAI CONF ARTIF INTE, V35, P1451