A Spatiotemporal Heterogeneous Two-Stream Network for Action Recognition

被引:23
|
作者
Chen, Enqing [1 ,2 ]
Bai, Xue [1 ,2 ]
Gao, Lei [3 ]
Tinega, Haron Chweya [1 ,2 ]
Ding, Yingqiang [1 ,2 ]
机构
[1] Zhengzhou Univ, Sch Informat Engn, Zhengzhou 450001, Henan, Peoples R China
[2] Zhengzhou Univ, Ind Technol Res Inst, Zhengzhou 450001, Henan, Peoples R China
[3] Ryerson Univ, Dept Elect & Comp Engn, Toronto, ON M5B 2K3, Canada
来源
IEEE ACCESS | 2019年 / 7卷
关键词
Action recognition; spatiotemporal heterogeneous; two-stream networks; ResNet; long-range temporal structure; training strategies;
D O I
10.1109/ACCESS.2019.2910604
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The method based on the two-stream networks has achieved great success in video action recognition. However, most existing methods employ the same structure for both spatial and temporal networks, leading to unsatisfied performance. In this paper, we propose a spatiotemporal heterogeneous two-stream network, which employs two different network structures for spatial and temporal information, respectively. Specifically, the Residual network (ResNet) and BN-Inception are utilized as the base networks to present the spatiotemporal characteristics of different human actions. In addition, a segmental architecture is employed to model long-range temporal structure over video sequences to better distinguish the similar actions owning sub-action sharing phenomenon. Moreover, combined with the strategy of data augment, a modified cross-modal pre-training strategy is proposed and applied to the spatiotemporal heterogeneous network to improve the final performance of human actions recognition. The experiments on UCF101 and HMDB51 datasets demonstrate the proposed spatiotemporal heterogeneous two-stream network outperforms the spatiotemporal isomorphic networks and other related methods.
引用
收藏
页码:57267 / 57275
页数:9
相关论文
共 50 条
  • [1] A heterogeneous two-stream network for human action recognition
    Liao, Shengbin
    Wang, Xiaofeng
    Yang, ZongKai
    AI COMMUNICATIONS, 2023, 36 (03) : 219 - 233
  • [2] Two-stream spatiotemporal networks for skeleton action recognition
    Wang, Lei
    Zhang, Jianwei
    Yang, Shanmin
    Gu, Song
    IET IMAGE PROCESSING, 2023, 17 (11) : 3358 - 3370
  • [3] A two-stream heterogeneous network for action recognition based on skeleton and RGB modalities
    Liu, Kai
    Gao, Lei
    Khan, Naimul Mefraz
    Qi, Lin
    Guan, Ling
    23RD IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2021), 2021, : 87 - 91
  • [4] Two-stream spatiotemporal feature fusion for human action recognition
    Abdelbaky, Amany
    Aly, Saleh
    VISUAL COMPUTER, 2021, 37 (07): : 1821 - 1835
  • [5] Two-stream spatiotemporal feature fusion for human action recognition
    Amany Abdelbaky
    Saleh Aly
    The Visual Computer, 2021, 37 : 1821 - 1835
  • [6] A Multimode Two-Stream Network for Egocentric Action Recognition
    Li, Ying
    Shen, Jie
    Xiong, Xin
    He, Wei
    Li, Peng
    Yan, Wenjie
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT I, 2021, 12891 : 357 - 368
  • [7] Convolutional Two-Stream Network Fusion for Video Action Recognition
    Feichtenhofer, Christoph
    Pinz, Axel
    Zisserman, Andrew
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 1933 - 1941
  • [8] Two-Stream Convolutional Neural Network for Video Action Recognition
    Qiao, Han
    Liu, Shuang
    Xu, Qingzhen
    Liu, Shouqiang
    Yang, Wanggan
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2021, 15 (10): : 3668 - 3684
  • [9] Hidden Two-Stream Collaborative Learning Network for Action Recognition
    Zhou, Shuren
    Chen, Le
    Sugumaran, Vijayan
    CMC-COMPUTERS MATERIALS & CONTINUA, 2020, 63 (03): : 1545 - 1561
  • [10] Two-Stream Convolution Neural Network with Video-stream for Action Recognition
    Dai, Wei
    Chen, Yimin
    Huang, Chen
    Gao, Ming-Ke
    Zhang, Xinyu
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,