Temporal interaction and excitation for action recognition

被引:0
作者
Wang, Chenwu [1 ,2 ]
Yang, Linfeng [3 ]
Zhu, Zhixiang [1 ]
Wang, Pei [1 ]
Nan, Ajian [4 ]
机构
[1] Xian Univ Posts & Telecommun, Sch Modern Post, Xian, Peoples R China
[2] Northwestern Polytech Univ, Sch Comp Sci, Xian, Peoples R China
[3] Xian Univ Posts & Telecommun, Sch Comp Sci & Technol, Xian, Peoples R China
[4] Shaanxi Informat Engn Res Inst, Data Operat Dept, Xian, Peoples R China
关键词
action recognition; temporal interaction fusion; motion excitation;
D O I
10.1117/1.JEI.32.4.043028
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Two-stream networks have been widely used in action recognition by integrating the appearance information from RGB frames with the motion-rich optical flow data, resulting in impressive recognition accuracy. However, the drawbacks of two-stream networks using 2D convolutional neural network (CNN) are apparent. The computation of optical flow is resource intensive and time consuming, and 2D CNN cannot model temporal information. To alleviate these problems, we propose a temporal interaction and excitation (TIE) module that can be embedded into existing 2D CNN in a plug-and-play manner. It comprises two components: the temporal interaction fusion (TIF) module and the motion excitation (ME) module. The TIF module employs channel-wise temporal convolution to adaptively fuse information from adjacent video frames, facilitating information exchange between the adjacent frames while maintaining its spatial feature learning capability. The ME module is a lightweight motion extraction module that leverages the differences between adjacent frames to model feature-level motion information, replacing the need for traditional optical flow. This enhancement helps the model better understand human actions in videos. The TIE module is designed to improve the network performance while adding minimal computational cost. Experimental results demonstrate that our proposed method adds only a few parameters and computational costs while achieving competitive results on HMDB-51 and UCF-101 datasets. (c) 2023 SPIE and IS&T
引用
收藏
页数:11
相关论文
共 49 条
  • [21] Dual attention convolutional network for action recognition
    Li, Xiaoqiang
    Xie, Miao
    Zhang, Yin
    Ding, Guangtai
    Tong, Weiqin
    [J]. IET IMAGE PROCESSING, 2020, 14 (06) : 1059 - 1065
  • [22] TEA: Temporal Excitation and Aggregation for Action Recognition
    Li, Yan
    Ji, Bin
    Shi, Xintian
    Zhang, Jianguo
    Kang, Bin
    Wang, Limin
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 906 - 915
  • [23] TSM: Temporal Shift Module for Efficient Video Understanding
    Lin, Ji
    Gan, Chuang
    Han, Song
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 7082 - 7092
  • [24] Liu ZY, 2020, AAAI CONF ARTIF INTE, V34, P11669
  • [25] Paszke A, 2019, ADV NEUR IN, V32
  • [26] Representation Flow for Action Recognition
    Piergiovanni, A. J.
    Ryoo, Michael S.
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 9937 - 9945
  • [27] TRM:Temporal Relocation Module for Video Recognition
    Qian, Yijun
    Kang, Guoliang
    Yu, Lijun
    Liu, Wenhe
    Hauptmann, Alexander G.
    [J]. 2022 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW 2022), 2022, : 151 - 160
  • [28] Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks
    Qiu, Zhaofan
    Yao, Ting
    Mei, Tao
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5534 - 5542
  • [29] 2D progressive fusion module for action recognition*
    Shen, Zhongwei
    Wu, Xiao-Jun
    Kittler, Josef
    [J]. IMAGE AND VISION COMPUTING, 2021, 109
  • [30] Simonyan K, 2014, ADV NEUR IN, V27