Spatial-temporal interaction learning based two-stream network for action recognition

被引：39

作者：

Liu, Tianyu ^{[1
]}

Ma, Yujun ^{[2
]}

Yang, Wenhan ^{[1
]}

Ji, Wanting ^{[3
]}

Wang, Ruili ^{[2
]}

Jiang, Ping ^{[1
]}

机构：

[1] Hunan Agr Univ, Coll Mech & Elect Engn, Changsha, Peoples R China

[2] Massey Univ, Sch Math & Computat Sci, Auckland, New Zealand

[3] Liaoning Univ, Sch Informat, Shenyang, Peoples R China

来源：

INFORMATION SCIENCES | 2022年 / 606卷

关键词：

Action recognition; Spatial-temporal; Two-stream CNNs;

D O I：

10.1016/j.ins.2022.05.092

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Two-stream convolutional neural networks have been widely applied to action recognition. However, two-stream networks are usually adopted to capture spatial information and temporal information separately, which normally ignore the strong complementarity and correlation between spatial and temporal information in videos. To solve this problem, we propose a Spatial-Temporal Interaction Learning Two-stream network (STILT) for action recognition. Our proposed two-stream (i.e., a spatial stream and a temporal stream) network has a spatial-temporal interaction learning module, which uses an alternating co attention mechanism between two streams to learn the correlation between spatial features and temporal features. The spatial-temporal interaction learning module allows the two streams to guide each other and then generates optimized spatial attention features and temporal attention features. Thus, the proposed network can establish the interactive connection between two streams, which efficiently exploits the attended spatial and temporal features to improve recognition accuracy. Experiments on three widely used datasets (i.e., UCF101, HMDB51 and Kinetics) show that the proposed network outperforms the state-of-the-art models in action recognition.(c) 2022 Elsevier Inc. All rights reserved.

引用

页码：864 / 876

页数：13

共 50 条

[1] Spatial-temporal multiscale feature optimization based two-stream convolutional neural network for action recognition
Xia, Limin
Fu, Weiye
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (08): : 11611 - 11626
[2] Two-stream spatial-temporal neural networks for pose-based action recognition
Wang, Zixuan
Zhu, Aichun
Hu, Fangqiang
Wu, Qianyu
Li, Yifeng
JOURNAL OF ELECTRONIC IMAGING, 2020, 29 (04)
[3] Two-Stream Collaborative Learning With Spatial-Temporal Attention for Video Classification
Peng, Yuxin
Zhao, Yunzhen
Zhang, Junchao
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (03) : 773 - 786
[4] Hidden Two-Stream Collaborative Learning Network for Action Recognition
Zhou, Shuren
Chen, Le
Sugumaran, Vijayan
CMC-COMPUTERS MATERIALS & CONTINUA, 2020, 63 (03): : 1545 - 1561
[5] Spatial-temporal interaction module for action recognition
Luo, Hui-Lan
Chen, Han
Cheung, Yiu-Ming
Yu, Yawei
JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (04)
[6] Human Action Recognition Based on Improved Two-Stream Convolution Network
Wang, Zhongwen
Lu, Haozhu
Jin, Junlan
Hu, Kai
APPLIED SCIENCES-BASEL, 2022, 12 (12):
[7] A Multimode Two-Stream Network for Egocentric Action Recognition
Li, Ying
Shen, Jie
Xiong, Xin
He, Wei
Li, Peng
Yan, Wenjie
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT I, 2021, 12891 : 357 - 368
[8] A Spatiotemporal Heterogeneous Two-Stream Network for Action Recognition
Chen, Enqing
Bai, Xue
Gao, Lei
Tinega, Haron Chweya
Ding, Yingqiang
IEEE ACCESS, 2019, 7 : 57267 - 57275
[9] Two-Stream Dictionary Learning Architecture for Action Recognition
Xu, Ke
Jiang, Xinghao
Sun, Tanfeng
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2017, 27 (03) : 567 - 576
[10] Human Action Recognition based on Two-Stream Ind Recurrent Neural Network
Ge Penghua
Zhi Min
TENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2018), 2019, 11069

← 1 2 3 4 5 →