Video Action Recognition Based on Spatio-temporal Feature Pyramid Module

被引：1

作者：

Gong, Suming ^{[1
]}

Chen, Ying ^{[1
]}

机构：

[1] Jiangnan Univ, Minist Educ, Key Lab Adv Proc Control Light Ind, Wuxi, Jiangsu, Peoples R China

来源：

2020 13TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID 2020) | 2020年

基金：

中国国家自然科学基金;

关键词：

Action recognition; Dilated convolution; Spatiotemporal feature pyramid;

D O I：

10.1109/ISCID51228.2020.00082

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Modeling the spatio-temporal information of different actions facilitates their recognition. The mainstream 2D convolutional network has low computational cost but cannot capture timing information; the mainstream 3D convolutional network can extract spatio-temporal features but has a huge amount of calculation and is difficult to deploy. In this paper, a Spatiotemporal Feature Pyramid Module(STFPM) is proposed to extract spatio-temporal feature information. STFPM captures temporal information between frames by dilated convolution and fuses feature information by weighted addition. STFPM can be flexibly inserted into the 2D backbone network in a plug-and-play manner. When equipped with STFPM, 2D ResNet-50 achieves good results on UCF101 dataset and HMDB51 dataset.

引用

页码：338 / 341

页数：4

共 17 条

[1]

[Anonymous], 2016, ICLR

[2] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].

Carreira, Joao ;

Zisserman, Andrew .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733

[3] Deep Temporal Linear Encoding Networks [J].

Diba, Ali ;

Sharma, Vivek ;

Van Gool, Luc .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1541-1550

[4] Learning Spatiotemporal Features with 3D Convolutional Networks [J].

Du Tran ;

Bourdev, Lubomir ;

Fergus, Rob ;

Torresani, Lorenzo ;

Paluri, Manohar .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497

[5]

He DL, 2019, AAAI CONF ARTIF INTE, P8401

[6] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[7]

Kuehne H, 2011, IEEE I CONF COMP VIS, P2556, DOI 10.1109/ICCV.2011.6126543

[8] ENABLING 5G ON THE OCEAN: A HYBRID SATELLITE-UAV-TERRESTRIAL NETWORK SOLUTION [J].

Li, Xiangling ;

Feng, Wei ;

Wang, Jue ;

Chen, Yunfei ;

Ge, Ning ;

Wang, Cheng-Xiang .

IEEE WIRELESS COMMUNICATIONS, 2020, 27 (06) :116-121

[9] TSM: Temporal Shift Module for Efficient Video Understanding [J].

Lin, Ji ;

Gan, Chuang ;

Han, Song .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :7082-7092

[10] Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks [J].

Qiu, Zhaofan ;

Yao, Ting ;

Mei, Tao .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5534-5542

← 1 2 →