Video Action Recognition Based on Spatio-temporal Feature Pyramid Module

被引:1
作者
Gong, Suming [1 ]
Chen, Ying [1 ]
机构
[1] Jiangnan Univ, Minist Educ, Key Lab Adv Proc Control Light Ind, Wuxi, Jiangsu, Peoples R China
来源
2020 13TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID 2020) | 2020年
基金
中国国家自然科学基金;
关键词
Action recognition; Dilated convolution; Spatiotemporal feature pyramid;
D O I
10.1109/ISCID51228.2020.00082
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Modeling the spatio-temporal information of different actions facilitates their recognition. The mainstream 2D convolutional network has low computational cost but cannot capture timing information; the mainstream 3D convolutional network can extract spatio-temporal features but has a huge amount of calculation and is difficult to deploy. In this paper, a Spatiotemporal Feature Pyramid Module(STFPM) is proposed to extract spatio-temporal feature information. STFPM captures temporal information between frames by dilated convolution and fuses feature information by weighted addition. STFPM can be flexibly inserted into the 2D backbone network in a plug-and-play manner. When equipped with STFPM, 2D ResNet-50 achieves good results on UCF101 dataset and HMDB51 dataset.
引用
收藏
页码:338 / 341
页数:4
相关论文
共 17 条
[1]  
[Anonymous], 2016, ICLR
[2]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[3]   Deep Temporal Linear Encoding Networks [J].
Diba, Ali ;
Sharma, Vivek ;
Van Gool, Luc .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1541-1550
[4]   Learning Spatiotemporal Features with 3D Convolutional Networks [J].
Du Tran ;
Bourdev, Lubomir ;
Fergus, Rob ;
Torresani, Lorenzo ;
Paluri, Manohar .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497
[5]  
He DL, 2019, AAAI CONF ARTIF INTE, P8401
[6]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[7]  
Kuehne H, 2011, IEEE I CONF COMP VIS, P2556, DOI 10.1109/ICCV.2011.6126543
[8]   ENABLING 5G ON THE OCEAN: A HYBRID SATELLITE-UAV-TERRESTRIAL NETWORK SOLUTION [J].
Li, Xiangling ;
Feng, Wei ;
Wang, Jue ;
Chen, Yunfei ;
Ge, Ning ;
Wang, Cheng-Xiang .
IEEE WIRELESS COMMUNICATIONS, 2020, 27 (06) :116-121
[9]   TSM: Temporal Shift Module for Efficient Video Understanding [J].
Lin, Ji ;
Gan, Chuang ;
Han, Song .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :7082-7092
[10]   Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks [J].
Qiu, Zhaofan ;
Yao, Ting ;
Mei, Tao .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5534-5542