A Spatiotemporal Heterogeneous Two-Stream Network for Action Recognition

被引:23
|
作者
Chen, Enqing [1 ,2 ]
Bai, Xue [1 ,2 ]
Gao, Lei [3 ]
Tinega, Haron Chweya [1 ,2 ]
Ding, Yingqiang [1 ,2 ]
机构
[1] Zhengzhou Univ, Sch Informat Engn, Zhengzhou 450001, Henan, Peoples R China
[2] Zhengzhou Univ, Ind Technol Res Inst, Zhengzhou 450001, Henan, Peoples R China
[3] Ryerson Univ, Dept Elect & Comp Engn, Toronto, ON M5B 2K3, Canada
来源
IEEE ACCESS | 2019年 / 7卷
关键词
Action recognition; spatiotemporal heterogeneous; two-stream networks; ResNet; long-range temporal structure; training strategies;
D O I
10.1109/ACCESS.2019.2910604
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The method based on the two-stream networks has achieved great success in video action recognition. However, most existing methods employ the same structure for both spatial and temporal networks, leading to unsatisfied performance. In this paper, we propose a spatiotemporal heterogeneous two-stream network, which employs two different network structures for spatial and temporal information, respectively. Specifically, the Residual network (ResNet) and BN-Inception are utilized as the base networks to present the spatiotemporal characteristics of different human actions. In addition, a segmental architecture is employed to model long-range temporal structure over video sequences to better distinguish the similar actions owning sub-action sharing phenomenon. Moreover, combined with the strategy of data augment, a modified cross-modal pre-training strategy is proposed and applied to the spatiotemporal heterogeneous network to improve the final performance of human actions recognition. The experiments on UCF101 and HMDB51 datasets demonstrate the proposed spatiotemporal heterogeneous two-stream network outperforms the spatiotemporal isomorphic networks and other related methods.
引用
收藏
页码:57267 / 57275
页数:9
相关论文
共 50 条
  • [21] 3D Convolutional Two-Stream Network for Action Recognition in Videos
    Li, Min
    Qi, Yuezhu
    Yang, Jian
    Zhang, Yanfang
    Ren, Junxing
    Du, Hong
    2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 1697 - 1701
  • [22] Fully-Coupled Two-Stream Spatiotemporal Networks for Extremely Low Resolution Action Recognition
    Xu, Mingze
    Sharghi, Aidean
    Chen, Xin
    Crandall, David J.
    2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 1597 - 1605
  • [23] Two-stream Deep Representation for Human Action Recognition
    Ghrab, Najla Bouarada
    Fendri, Emna
    Hammami, Mohamed
    FOURTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2021), 2022, 12084
  • [24] Improved two-stream model for human action recognition
    Zhao, Yuxuan
    Man, Ka Lok
    Smith, Jeremy
    Siddique, Kamran
    Guan, Sheng-Uei
    EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2020, 2020 (01)
  • [25] Hidden Two-Stream Convolutional Networks for Action Recognition
    Zhu, Yi
    Lan, Zhenzhong
    Newsam, Shawn
    Hauptmann, Alexander
    COMPUTER VISION - ACCV 2018, PT III, 2019, 11363 : 363 - 378
  • [26] Two-Stream Dictionary Learning Architecture for Action Recognition
    Xu, Ke
    Jiang, Xinghao
    Sun, Tanfeng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2017, 27 (03) : 567 - 576
  • [27] Two-Stream Gated Fusion ConvNets for Action Recognition
    Zhu, Jiagang
    Zou, Wei
    Zhu, Zheng
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 597 - 602
  • [28] Two-Stream Convolutional Networks for Action Recognition in Videos
    Simonyan, Karen
    Zisserman, Andrew
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [29] Improved two-stream model for human action recognition
    Yuxuan Zhao
    Ka Lok Man
    Jeremy Smith
    Kamran Siddique
    Sheng-Uei Guan
    EURASIP Journal on Image and Video Processing, 2020
  • [30] Spatiotemporal two-stream LSTM network for unsupervised video summarization
    Min Hu
    Ruimin Hu
    Zhongyuan Wang
    Zixiang Xiong
    Rui Zhong
    Multimedia Tools and Applications, 2022, 81 : 40489 - 40510