Temporal interaction and excitation for action recognition

被引：0

作者：

Wang, Chenwu ^{[1
,2
]}

Yang, Linfeng ^{[3
]}

Zhu, Zhixiang ^{[1
]}

Wang, Pei ^{[1
]}

Nan, Ajian ^{[4
]}

机构：

[1] Xian Univ Posts & Telecommun, Sch Modern Post, Xian, Peoples R China

[2] Northwestern Polytech Univ, Sch Comp Sci, Xian, Peoples R China

[3] Xian Univ Posts & Telecommun, Sch Comp Sci & Technol, Xian, Peoples R China

[4] Shaanxi Informat Engn Res Inst, Data Operat Dept, Xian, Peoples R China

来源：

JOURNAL OF ELECTRONIC IMAGING | 2023年 / 32卷 / 04期

关键词：

action recognition; temporal interaction fusion; motion excitation;

D O I：

10.1117/1.JEI.32.4.043028

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Two-stream networks have been widely used in action recognition by integrating the appearance information from RGB frames with the motion-rich optical flow data, resulting in impressive recognition accuracy. However, the drawbacks of two-stream networks using 2D convolutional neural network (CNN) are apparent. The computation of optical flow is resource intensive and time consuming, and 2D CNN cannot model temporal information. To alleviate these problems, we propose a temporal interaction and excitation (TIE) module that can be embedded into existing 2D CNN in a plug-and-play manner. It comprises two components: the temporal interaction fusion (TIF) module and the motion excitation (ME) module. The TIF module employs channel-wise temporal convolution to adaptively fuse information from adjacent video frames, facilitating information exchange between the adjacent frames while maintaining its spatial feature learning capability. The ME module is a lightweight motion extraction module that leverages the differences between adjacent frames to model feature-level motion information, replacing the need for traditional optical flow. This enhancement helps the model better understand human actions in videos. The TIE module is designed to improve the network performance while adding minimal computational cost. Experimental results demonstrate that our proposed method adds only a few parameters and computational costs while achieving competitive results on HMDB-51 and UCF-101 datasets. (c) 2023 SPIE and IS&T

引用

页数：11

共 49 条

[21] Dual attention convolutional network for action recognition
Li, Xiaoqiang
Xie, Miao
Zhang, Yin
Ding, Guangtai
Tong, Weiqin
[J]. IET IMAGE PROCESSING, 2020, 14 (06) : 1059 - 1065
[22] TEA: Temporal Excitation and Aggregation for Action Recognition
Li, Yan
Ji, Bin
Shi, Xintian
Zhang, Jianguo
Kang, Bin
Wang, Limin
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 906 - 915
[23] TSM: Temporal Shift Module for Efficient Video Understanding
Lin, Ji
Gan, Chuang
Han, Song
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 7082 - 7092
[24] Liu ZY, 2020, AAAI CONF ARTIF INTE, V34, P11669
[25] Paszke A, 2019, ADV NEUR IN, V32
[26] Representation Flow for Action Recognition
Piergiovanni, A. J.
Ryoo, Michael S.
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 9937 - 9945
[27] TRM:Temporal Relocation Module for Video Recognition
Qian, Yijun
Kang, Guoliang
Yu, Lijun
Liu, Wenhe
Hauptmann, Alexander G.
[J]. 2022 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW 2022), 2022, : 151 - 160
[28] Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks
Qiu, Zhaofan
Yao, Ting
Mei, Tao
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5534 - 5542
[29] 2D progressive fusion module for action recognition*
Shen, Zhongwei
Wu, Xiao-Jun
Kittler, Josef
[J]. IMAGE AND VISION COMPUTING, 2021, 109
[30] Simonyan K, 2014, ADV NEUR IN, V27

← 1 2 3 4 5 →