TBRNet: Two-Stream BiLSTM Residual Network for Video Action Recognition

被引:6
|
作者
Wu, Xiao [1 ,2 ]
Ji, Qingge [1 ,2 ]
机构
[1] Sun Yat Sen Univ, Sch Data & Comp Sci, Guangzhou 510006, Peoples R China
[2] Guangdong Key Lab Big Data Anal & Proc, Guangzhou 510006, Peoples R China
关键词
action recognition; bidirectional long short-term memory; residual connection; temporal attention mechanism; two-stream networks;
D O I
10.3390/a13070169
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Modeling spatiotemporal representations is one of the most essential yet challenging issues in video action recognition. Existing methods lack the capacity to accurately model either the correlations between spatial and temporal features or the global temporal dependencies. Inspired by the two-stream network for video action recognition, we propose an encoder-decoder framework named Two-Stream Bidirectional Long Short-Term Memory (LSTM) Residual Network (TBRNet) which takes advantage of the interaction between spatiotemporal representations and global temporal dependencies. In the encoding phase, the two-stream architecture, based on the proposed Residual Convolutional 3D (Res-C3D) network, extracts features with residual connections inserted between the two pathways, and then the features are fused to become the short-term spatiotemporal features of the encoder. In the decoding phase, those short-term spatiotemporal features are first fed into a temporal attention-based bidirectional LSTM (BiLSTM) network to obtain long-term bidirectional attention-pooling dependencies. Subsequently, those temporal dependencies are integrated with short-term spatiotemporal features to obtain global spatiotemporal relationships. On two benchmark datasets, UCF101 and HMDB51, we verified the effectiveness of our proposed TBRNet by a series of experiments, and it achieved competitive or even better results compared with existing state-of-the-art approaches.
引用
收藏
页码:1 / 21
页数:21
相关论文
共 50 条
  • [21] Transferable two-stream convolutional neural network for human action recognition
    Xiong, Qianqian
    Zhang, Jianjing
    Wang, Peng
    Liu, Dongdong
    Gao, Robert X.
    JOURNAL OF MANUFACTURING SYSTEMS, 2020, 56 : 605 - 614
  • [22] Two-Stream Action Recognition-Oriented Video Super-Resolution
    Zhang, Haochen
    Liu, Dong
    Xiong, Zhiwei
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 8798 - 8807
  • [23] Weakly supervised video action localisation via two-stream action activation network
    Yin, Chang
    Liao, Zhongke
    Hu, Haifeng
    Chen, Dihu
    ELECTRONICS LETTERS, 2019, 55 (21) : 1126 - 1127
  • [24] Efficient Two-stream Action Recognition on FPGA
    Lin, Jia-Ming
    Lai, Kuan-Ting
    Wu, Bin-Ray
    Chen, Ming-Syan
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3070 - 3074
  • [25] Fuzzy Fusion for Two-stream Action Recognition
    Sousa e Santos, Anderson Carlos
    Maia, Helena de Almeida
    Roberto e Souza, Marcos
    Vieira, Marcelo Bernardes
    Pedrini, Helio
    PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VOL 5: VISAPP, 2020, : 117 - 123
  • [26] A simulated two-stream network via multilevel distillation of reviewed features and decoupled logits for video action recognition
    Gao, Zitao
    Liu, Xiangjian
    Wang, Anna K.
    Lin, Liyu
    VISUAL COMPUTER, 2024, : 3907 - 3923
  • [27] Human Action Recognition based on Two-Stream Ind Recurrent Neural Network
    Ge Penghua
    Zhi Min
    TENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2018), 2019, 11069
  • [28] Two-Stream 3D Convolution Attentional Network for Action Recognition
    Kusumoseniarto, Raden Hadapiningsyah
    2020 JOINT 9TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV) AND 2020 4TH INTERNATIONAL CONFERENCE ON IMAGING, VISION & PATTERN RECOGNITION (ICIVPR), 2020,
  • [29] Enhanced Spatial Stream of Two-Stream Network Using Optical Flow for Human Action Recognition
    Khan, Shahbaz
    Hassan, Ali
    Hussain, Farhan
    Perwaiz, Aqib
    Riaz, Farhan
    Alsabaan, Maazen
    Abdul, Wadood
    APPLIED SCIENCES-BASEL, 2023, 13 (14):
  • [30] A two-stream heterogeneous network for action recognition based on skeleton and RGB modalities
    Liu, Kai
    Gao, Lei
    Khan, Naimul Mefraz
    Qi, Lin
    Guan, Ling
    23RD IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2021), 2021, : 87 - 91