TBRNet: Two-Stream BiLSTM Residual Network for Video Action Recognition

被引:6
|
作者
Wu, Xiao [1 ,2 ]
Ji, Qingge [1 ,2 ]
机构
[1] Sun Yat Sen Univ, Sch Data & Comp Sci, Guangzhou 510006, Peoples R China
[2] Guangdong Key Lab Big Data Anal & Proc, Guangzhou 510006, Peoples R China
关键词
action recognition; bidirectional long short-term memory; residual connection; temporal attention mechanism; two-stream networks;
D O I
10.3390/a13070169
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Modeling spatiotemporal representations is one of the most essential yet challenging issues in video action recognition. Existing methods lack the capacity to accurately model either the correlations between spatial and temporal features or the global temporal dependencies. Inspired by the two-stream network for video action recognition, we propose an encoder-decoder framework named Two-Stream Bidirectional Long Short-Term Memory (LSTM) Residual Network (TBRNet) which takes advantage of the interaction between spatiotemporal representations and global temporal dependencies. In the encoding phase, the two-stream architecture, based on the proposed Residual Convolutional 3D (Res-C3D) network, extracts features with residual connections inserted between the two pathways, and then the features are fused to become the short-term spatiotemporal features of the encoder. In the decoding phase, those short-term spatiotemporal features are first fed into a temporal attention-based bidirectional LSTM (BiLSTM) network to obtain long-term bidirectional attention-pooling dependencies. Subsequently, those temporal dependencies are integrated with short-term spatiotemporal features to obtain global spatiotemporal relationships. On two benchmark datasets, UCF101 and HMDB51, we verified the effectiveness of our proposed TBRNet by a series of experiments, and it achieved competitive or even better results compared with existing state-of-the-art approaches.
引用
收藏
页码:1 / 21
页数:21
相关论文
共 50 条
  • [41] Spatial-temporal interaction learning based two-stream network for action recognition
    Liu, Tianyu
    Ma, Yujun
    Yang, Wenhan
    Ji, Wanting
    Wang, Ruili
    Jiang, Ping
    INFORMATION SCIENCES, 2022, 606 : 864 - 876
  • [42] YogNet: A two-stream network for realtime multiperson yoga action recognition and posture correction
    Yadav, Santosh Kumar
    Agarwal, Aayush
    Kumar, Ashish
    Tiwari, Kamlesh
    Pandey, Hari Mohan
    Akbar, Shaik Ali
    KNOWLEDGE-BASED SYSTEMS, 2022, 250
  • [43] Two-Stream Network for Sign Language Recognition and Translation
    Chen, Yutong
    Zuo, Ronglai
    Wei, Fangyun
    Wu, Yu
    Liu, Shujie
    Mak, Brian
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [44] VirtualActionNet: A strong two-stream point cloud sequence network for human action recognition
    Li, Xing
    Huang, Qian
    Wang, Zhijian
    Yang, Tianjin
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2022, 89
  • [45] An Accurate Device-Free Action Recognition System Using Two-Stream Network
    Sheng, Biyun
    Fang, Yuanrun
    Xiao, Fu
    Sun, Lijuan
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2020, 69 (07) : 7930 - 7939
  • [46] Interactive two-stream graph neural network for skeleton-based action recognition
    Yang, Dun
    Zhou, Qing
    Wen, Ju
    JOURNAL OF ELECTRONIC IMAGING, 2021, 30 (03)
  • [47] Searching for Two-Stream Models in Multivariate Space for Video Recognition
    Gong, Xinyu
    Wang, Heng
    Shou, Zheng
    Feiszli, Matt
    Wang, Zhangyang
    Yan, Zhicheng
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 8013 - 8022
  • [48] Human Action Recognition Combining Sequential Dynamic Images and Two-Stream Convolutional Network
    Zhang Wenqiang
    Wang Zengqiang
    Zhang Liang
    LASER & OPTOELECTRONICS PROGRESS, 2021, 58 (02)
  • [49] A Two-Stream Network For Driving Hand Gesture Recognition
    Zhou, Yefan
    Lv, Zhao
    Wang, Chaoqun
    Zhang, Shengli
    20TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2020), 2020, : 553 - 560
  • [50] Two-stream spatiotemporal feature fusion for human action recognition
    Abdelbaky, Amany
    Aly, Saleh
    VISUAL COMPUTER, 2021, 37 (07): : 1821 - 1835