Spatial-temporal interaction learning based two-stream network for action recognition

被引:39
|
作者
Liu, Tianyu [1 ]
Ma, Yujun [2 ]
Yang, Wenhan [1 ]
Ji, Wanting [3 ]
Wang, Ruili [2 ]
Jiang, Ping [1 ]
机构
[1] Hunan Agr Univ, Coll Mech & Elect Engn, Changsha, Peoples R China
[2] Massey Univ, Sch Math & Computat Sci, Auckland, New Zealand
[3] Liaoning Univ, Sch Informat, Shenyang, Peoples R China
关键词
Action recognition; Spatial-temporal; Two-stream CNNs;
D O I
10.1016/j.ins.2022.05.092
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Two-stream convolutional neural networks have been widely applied to action recognition. However, two-stream networks are usually adopted to capture spatial information and temporal information separately, which normally ignore the strong complementarity and correlation between spatial and temporal information in videos. To solve this problem, we propose a Spatial-Temporal Interaction Learning Two-stream network (STILT) for action recognition. Our proposed two-stream (i.e., a spatial stream and a temporal stream) network has a spatial-temporal interaction learning module, which uses an alternating co attention mechanism between two streams to learn the correlation between spatial features and temporal features. The spatial-temporal interaction learning module allows the two streams to guide each other and then generates optimized spatial attention features and temporal attention features. Thus, the proposed network can establish the interactive connection between two streams, which efficiently exploits the attended spatial and temporal features to improve recognition accuracy. Experiments on three widely used datasets (i.e., UCF101, HMDB51 and Kinetics) show that the proposed network outperforms the state-of-the-art models in action recognition.(c) 2022 Elsevier Inc. All rights reserved.
引用
收藏
页码:864 / 876
页数:13
相关论文
共 50 条
  • [1] Spatial-temporal multiscale feature optimization based two-stream convolutional neural network for action recognition
    Xia, Limin
    Fu, Weiye
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (08): : 11611 - 11626
  • [2] Two-stream spatial-temporal neural networks for pose-based action recognition
    Wang, Zixuan
    Zhu, Aichun
    Hu, Fangqiang
    Wu, Qianyu
    Li, Yifeng
    JOURNAL OF ELECTRONIC IMAGING, 2020, 29 (04)
  • [3] Two-Stream Collaborative Learning With Spatial-Temporal Attention for Video Classification
    Peng, Yuxin
    Zhao, Yunzhen
    Zhang, Junchao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (03) : 773 - 786
  • [4] Hidden Two-Stream Collaborative Learning Network for Action Recognition
    Zhou, Shuren
    Chen, Le
    Sugumaran, Vijayan
    CMC-COMPUTERS MATERIALS & CONTINUA, 2020, 63 (03): : 1545 - 1561
  • [5] Spatial-temporal interaction module for action recognition
    Luo, Hui-Lan
    Chen, Han
    Cheung, Yiu-Ming
    Yu, Yawei
    JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (04)
  • [6] Human Action Recognition Based on Improved Two-Stream Convolution Network
    Wang, Zhongwen
    Lu, Haozhu
    Jin, Junlan
    Hu, Kai
    APPLIED SCIENCES-BASEL, 2022, 12 (12):
  • [7] A Multimode Two-Stream Network for Egocentric Action Recognition
    Li, Ying
    Shen, Jie
    Xiong, Xin
    He, Wei
    Li, Peng
    Yan, Wenjie
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT I, 2021, 12891 : 357 - 368
  • [8] A Spatiotemporal Heterogeneous Two-Stream Network for Action Recognition
    Chen, Enqing
    Bai, Xue
    Gao, Lei
    Tinega, Haron Chweya
    Ding, Yingqiang
    IEEE ACCESS, 2019, 7 : 57267 - 57275
  • [9] Two-Stream Dictionary Learning Architecture for Action Recognition
    Xu, Ke
    Jiang, Xinghao
    Sun, Tanfeng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2017, 27 (03) : 567 - 576
  • [10] Human Action Recognition based on Two-Stream Ind Recurrent Neural Network
    Ge Penghua
    Zhi Min
    TENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2018), 2019, 11069