TBRNet: Two-Stream BiLSTM Residual Network for Video Action Recognition

被引：6

作者：

Wu, Xiao ^{[1
,2
]}

Ji, Qingge ^{[1
,2
]}

机构：

[1] Sun Yat Sen Univ, Sch Data & Comp Sci, Guangzhou 510006, Peoples R China

[2] Guangdong Key Lab Big Data Anal & Proc, Guangzhou 510006, Peoples R China

来源：

ALGORITHMS | 2020年 / 13卷 / 07期

关键词：

action recognition; bidirectional long short-term memory; residual connection; temporal attention mechanism; two-stream networks;

D O I：

10.3390/a13070169

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Modeling spatiotemporal representations is one of the most essential yet challenging issues in video action recognition. Existing methods lack the capacity to accurately model either the correlations between spatial and temporal features or the global temporal dependencies. Inspired by the two-stream network for video action recognition, we propose an encoder-decoder framework named Two-Stream Bidirectional Long Short-Term Memory (LSTM) Residual Network (TBRNet) which takes advantage of the interaction between spatiotemporal representations and global temporal dependencies. In the encoding phase, the two-stream architecture, based on the proposed Residual Convolutional 3D (Res-C3D) network, extracts features with residual connections inserted between the two pathways, and then the features are fused to become the short-term spatiotemporal features of the encoder. In the decoding phase, those short-term spatiotemporal features are first fed into a temporal attention-based bidirectional LSTM (BiLSTM) network to obtain long-term bidirectional attention-pooling dependencies. Subsequently, those temporal dependencies are integrated with short-term spatiotemporal features to obtain global spatiotemporal relationships. On two benchmark datasets, UCF101 and HMDB51, we verified the effectiveness of our proposed TBRNet by a series of experiments, and it achieved competitive or even better results compared with existing state-of-the-art approaches.

引用

页码：1 / 21

页数：21

共 50 条

[21] Transferable two-stream convolutional neural network for human action recognition
Xiong, Qianqian
Zhang, Jianjing
Wang, Peng
Liu, Dongdong
Gao, Robert X.
JOURNAL OF MANUFACTURING SYSTEMS, 2020, 56 : 605 - 614
[22] Two-Stream Action Recognition-Oriented Video Super-Resolution
Zhang, Haochen
Liu, Dong
Xiong, Zhiwei
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 8798 - 8807
[23] Weakly supervised video action localisation via two-stream action activation network
Yin, Chang
Liao, Zhongke
Hu, Haifeng
Chen, Dihu
ELECTRONICS LETTERS, 2019, 55 (21) : 1126 - 1127
[24] Efficient Two-stream Action Recognition on FPGA
Lin, Jia-Ming
Lai, Kuan-Ting
Wu, Bin-Ray
Chen, Ming-Syan
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3070 - 3074
[25] Fuzzy Fusion for Two-stream Action Recognition
Sousa e Santos, Anderson Carlos
Maia, Helena de Almeida
Roberto e Souza, Marcos
Vieira, Marcelo Bernardes
Pedrini, Helio
PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VOL 5: VISAPP, 2020, : 117 - 123
[26] A simulated two-stream network via multilevel distillation of reviewed features and decoupled logits for video action recognition
Gao, Zitao
Liu, Xiangjian
Wang, Anna K.
Lin, Liyu
VISUAL COMPUTER, 2024, : 3907 - 3923
[27] Human Action Recognition based on Two-Stream Ind Recurrent Neural Network
Ge Penghua
Zhi Min
TENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2018), 2019, 11069
[28] Two-Stream 3D Convolution Attentional Network for Action Recognition
Kusumoseniarto, Raden Hadapiningsyah
2020 JOINT 9TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV) AND 2020 4TH INTERNATIONAL CONFERENCE ON IMAGING, VISION & PATTERN RECOGNITION (ICIVPR), 2020,
[29] Enhanced Spatial Stream of Two-Stream Network Using Optical Flow for Human Action Recognition
Khan, Shahbaz
Hassan, Ali
Hussain, Farhan
Perwaiz, Aqib
Riaz, Farhan
Alsabaan, Maazen
Abdul, Wadood
APPLIED SCIENCES-BASEL, 2023, 13 (14):
[30] A two-stream heterogeneous network for action recognition based on skeleton and RGB modalities
Liu, Kai
Gao, Lei
Khan, Naimul Mefraz
Qi, Lin
Guan, Ling
23RD IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2021), 2021, : 87 - 91

← 1 2 3 4 5 →