TwinLSTM: Two-channel LSTM Network for Online Action Detection

被引:5
作者
Han, Yunfei [1 ]
Tan, Shan [1 ]
机构
[1] Huazhong Univ Sci & Technol, Wuhan, Peoples R China
来源
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR) | 2022年
关键词
D O I
10.1109/ICPR56361.2022.9956717
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Online Action Detection (OAD) has attracted more and more attention in recent years. A network for OAD generally consists of three parts: a frame-level feature extractor, a temporal modeling module, and an action classifier. Most recent OAD networks use a single-channel Recurrent Neural Network (RNN) to capture long-term history information, with spatial and temporal features concatenated as network input. In OAD, spatial features describe object appearance and scene configuration within each frame while temporal features capture motion cues over time. It is crucial to effectively fuse both spatial and temporal features. In this paper, we propose a new framework named TwinLSTM based on two-channel Long Short-Term Memory (LSTM) network for OAD, in which each channel is used to extract and handle either spatial features or temporal features. To more effectively fuse both spatial and temporal features, we design a prediction fusion module (PFM) to utilize hidden states of both channels to obtain more action content, including information interaction and future context prediction. We evaluate TwinLSTM on two challenging datasets: THUMOS14 and HDD. Experiments show that TwinLSTM outperforms existing single-channel models by a significant margin. We also show the effectiveness of PFM through comprehensive ablation studies.
引用
收藏
页码:3310 / 3317
页数:8
相关论文
共 40 条
[1]  
[Anonymous], 2016, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2016.214
[2]  
Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698
[3]  
Cai YJ, 2019, AAAI CONF ARTIF INTE, P8118
[4]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[5]   Rethinking the Faster R-CNN Architecture for Temporal Action Localization [J].
Chao, Yu-Wei ;
Vijayanarasimhan, Sudheendra ;
Seybold, Bryan ;
Ross, David A. ;
Deng, Jia ;
Sukthankar, Rahul .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1130-1139
[6]  
Cho K., 2014, ARXIV14061078, DOI [10.48550/arXiv.1406.1078, DOI 10.3115/V1/D14-1179]
[7]  
Chung J, 2014, ARXIV
[8]   Online Action Detection [J].
De Geest, Roeland ;
Gavves, Efstratios ;
Ghodrati, Amir ;
Li, Zhenyang ;
Snoek, Cees ;
Tuytelaars, Tinne .
COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 :269-284
[9]  
Donahue J, 2015, PROC CVPR IEEE, P2625, DOI 10.1109/CVPR.2015.7298878
[10]   Learning Spatiotemporal Features with 3D Convolutional Networks [J].
Du Tran ;
Bourdev, Lubomir ;
Fergus, Rob ;
Torresani, Lorenzo ;
Paluri, Manohar .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497