Temporal interaction and excitation for action recognition

被引：0

作者：

Wang, Chenwu ^{[1
,2
]}

Yang, Linfeng ^{[3
]}

Zhu, Zhixiang ^{[1
]}

Wang, Pei ^{[1
]}

Nan, Ajian ^{[4
]}

机构：

[1] Xian Univ Posts & Telecommun, Sch Modern Post, Xian, Peoples R China

[2] Northwestern Polytech Univ, Sch Comp Sci, Xian, Peoples R China

[3] Xian Univ Posts & Telecommun, Sch Comp Sci & Technol, Xian, Peoples R China

[4] Shaanxi Informat Engn Res Inst, Data Operat Dept, Xian, Peoples R China

来源：

JOURNAL OF ELECTRONIC IMAGING | 2023年 / 32卷 / 04期

关键词：

action recognition; temporal interaction fusion; motion excitation;

D O I：

10.1117/1.JEI.32.4.043028

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Two-stream networks have been widely used in action recognition by integrating the appearance information from RGB frames with the motion-rich optical flow data, resulting in impressive recognition accuracy. However, the drawbacks of two-stream networks using 2D convolutional neural network (CNN) are apparent. The computation of optical flow is resource intensive and time consuming, and 2D CNN cannot model temporal information. To alleviate these problems, we propose a temporal interaction and excitation (TIE) module that can be embedded into existing 2D CNN in a plug-and-play manner. It comprises two components: the temporal interaction fusion (TIF) module and the motion excitation (ME) module. The TIF module employs channel-wise temporal convolution to adaptively fuse information from adjacent video frames, facilitating information exchange between the adjacent frames while maintaining its spatial feature learning capability. The ME module is a lightweight motion extraction module that leverages the differences between adjacent frames to model feature-level motion information, replacing the need for traditional optical flow. This enhancement helps the model better understand human actions in videos. The TIE module is designed to improve the network performance while adding minimal computational cost. Experimental results demonstrate that our proposed method adds only a few parameters and computational costs while achieving competitive results on HMDB-51 and UCF-101 datasets. (c) 2023 SPIE and IS&T

引用

页数：11

共 49 条

[1] Dynamic Image Networks for Action Recognition
Bilen, Hakan
Fernando, Basura
Gavves, Efstratios
Vedaldi, Andrea
Gould, Stephen
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3034 - 3042
[2] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Carreira, Joao
Zisserman, Andrew
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
[3] Video-based action recognition using spurious-3D residual attention networks
Chen, Bo
Tang, Hongying
Zhang, Zebin
Tong, Guanjun
Li, Baoqing
[J]. IET IMAGE PROCESSING, 2022, 16 (11) : 3097 - 3111
[4] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[5] Learning Spatiotemporal Features with 3D Convolutional Networks
Du Tran
Bourdev, Lubomir
Fergus, Rob
Torresani, Lorenzo
Paluri, Manohar
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497
[6] Feichtenhofer C, 2016, ADV NEUR IN, V29
[7] SlowFast Networks for Video Recognition
Feichtenhofer, Christoph
Fan, Haoqi
Malik, Jitendra
He, Kaiming
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6201 - 6210
[8] Spatiotemporal Multiplier Networks for Video Action Recognition
Feichtenhofer, Christoph
Pinz, Axel
Wildes, Richard P.
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 7445 - 7454
[9] Convolutional Two-Stream Network Fusion for Video Action Recognition
Feichtenhofer, Christoph
Pinz, Axel
Zisserman, Andrew
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 1933 - 1941
[10] Gowda SN, 2021, AAAI CONF ARTIF INTE, V35, P1451

← 1 2 3 4 5 →