ESTI: an action recognition network with enhanced spatio-temporal information

被引：3

作者：

Jiang, ZhiYu ^{[1
]}

Zhang, Yi ^{[1
]}

Hu, Shu ^{[1
]}

机构：

[1] Sichuan Univ, Coll Comp Sci, Chengdu 610000, Peoples R China

来源：

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS | 2023年 / 14卷 / 09期

关键词：

Action recognition; Feature enhancement; Global multi-scale feature; Local motion extraction; Spatio-temporal information;

D O I：

10.1007/s13042-023-01820-x

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Action recognition is an active topic in video understanding, which aims to recognize human actions in videos. The critical step is to model the spatio-temporal information and extract key action clues. To this end, we propose a simple and efficient network (dubbed ESTI) which consists of two core modules. The Local Motion Extraction module highlights the short-term temporal context. While the Global Multi-scale Feature Enhancement module strengthens the spatio-temporal and channel features to model long-term information. By appending ESTI to a 2D ResNet backbone, our network is capable of reasoning different kinds of actions with various amplitudes in videos. Our network is developed under two Geforce RTX 3090 using Python3.7/Pytorch1.8. Extensive experiments have been conducted on 5 mainstream datasets to verify the effectiveness of our network, in which ESTI outperforms most of the state-of-the-arts methods in terms of accuracy, computational cost and network scale. Besides, we also visualize the feature representation of our model by using Grad-Cam to validate its accuracy.

引用

页码：3059 / 3070

页数：12

共 48 条

[1] Bertasius G., 2018, ARXIV
[2] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Carreira, Joao
Zisserman, Andrew
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
[3] Dinarevic E.C., 2019, 2019 18 INT S INFOTE, P1
[4] Learning Spatiotemporal Features with 3D Convolutional Networks
Du Tran
Bourdev, Lubomir
Fergus, Rob
Torresani, Lorenzo
Paluri, Manohar
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497
[5] SlowFast Networks for Video Recognition
Feichtenhofer, Christoph
Fan, Haoqi
Malik, Jitendra
He, Kaiming
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6201 - 6210
[6] Arbitrary-view human action recognition via novel-view action generation
Gedamu, Kumie
Ji, Yanli
Yang, Yang
Gao, LingLing
Shen, Heng Tao
[J]. PATTERN RECOGNITION, 2021, 118
[7] The "something something" video database for learning and evaluating visual common sense
Goyal, Raghav
Kahou, Samira Ebrahimi
Michalski, Vincent
Materzynska, Joanna
Westphal, Susanne
Kim, Heuna
Haenel, Valentin
Fruend, Ingo
Yianilos, Peter
Mueller-Freitag, Moritz
Hoppe, Florian
Thurau, Christian
Bax, Ingo
Memisevic, Roland
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5843 - 5851
[8] DB-LSTM: Densely-connected Bi-directional LSTM for human action recognition
He, Jun-Yan
Wu, Xiao
Cheng, Zhi-Qi
Yuan, Zhaoquan
Jiang, Yu-Gang
[J]. NEUROCOMPUTING, 2021, 444 : 319 - 331
[9] Deep Residual Learning for Image Recognition
He, Kaiming
Zhang, Xiangyu
Ren, Shaoqing
Sun, Jian
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
[10] An overview of Human Action Recognition in sports based on Computer Vision
Host, Kristina
Ivasic-Kos, Marina
[J]. HELIYON, 2022, 8 (06)

← 1 2 3 4 5 →