Evaluating the Performance of Mobile-Convolutional Neural Networks for Spatial and Temporal Human Action Recognition Analysis

被引：4

作者：

Moutsis, Stavros N. ^{[1
]}

Tsintotas, Konstantinos A. ^{[1
]}

Kansizoglou, Ioannis ^{[1
]}

Gasteratos, Antonios ^{[1
]}

机构：

[1] Democritus Univ Thrace, Dept Prod & Management Engn, 12 Vas Sophias, GR-67132 Xanthi, Greece

来源：

ROBOTICS | 2023年 / 12卷 / 06期

关键词：

human action recognition; mobile-CNNs; spatial analysis; RNNs; temporal analysis; PLACE RECOGNITION; GOING DEEPER;

D O I：

10.3390/robotics12060167

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Human action recognition is a computer vision task that identifies how a person or a group acts on a video sequence. Various methods that rely on deep-learning techniques, such as two- or three-dimensional convolutional neural networks (2D-CNNs, 3D-CNNs), recurrent neural networks (RNNs), and vision transformers (ViT), have been proposed to address this problem over the years. Motivated by the fact that most of the used CNNs in human action recognition present high complexity, and the necessity of implementations on mobile platforms that are characterized by restricted computational resources, in this article, we conduct an extensive evaluation protocol over the performance metrics of five lightweight architectures. In particular, we examine how these mobile-oriented CNNs (viz., ShuffleNet-v2, EfficientNet-b0, MobileNet-v3, and GhostNet) execute in spatial analysis compared to a recent tiny ViT, namely EVA-02-Ti, and a higher computational model, ResNet-50. Our models, previously trained on ImageNet and BU101, are measured for their classification accuracy on HMDB51, UCF101, and six classes of the NTU dataset. The average and max scores, as well as the voting approaches, are generated through three and fifteen RGB frames of each video, while two different rates for the dropout layers were assessed during the training. Last, a temporal analysis via multiple types of RNNs that employ features extracted by the trained networks is examined. Our results reveal that EfficientNet-b0 and EVA-02-Ti surpass the other mobile-CNNs, achieving comparable or superior performance to ResNet-50.

引用

页数：30

共 50 条

[1] Action Recognition Based on Spatial Temporal Graph Convolutional Networks
Zheng, Wanqiang
Jing, Punan
Xu, Qingyang
PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND APPLICATION ENGINEERING (CSAE2019), 2019,
[2] Human Action Recognition based on Convolutional Neural Networks with a Convolutional Auto-Encoder
Geng, Chi
Song, JianXin
PROCEEDINGS OF THE 2015 5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCES AND AUTOMATION ENGINEERING, 2016, 42 : 933 - 938
[3] Human action recognition based on quaternion spatial-temporal convolutional neural network and LSTM in RGB videos
Meng, Bo
Liu, XueJun
Wang, Xiaolin
MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (20) : 26901 - 26918
[4] Human action recognition using genetic algorithms and convolutional neural networks
Ijjina, Earnest Paul
Chalavadi, Krishna Mohan
PATTERN RECOGNITION, 2016, 59 : 199 - 212
[5] Human action recognition based on quaternion spatial-temporal convolutional neural network and LSTM in RGB videos
Bo Meng
XueJun Liu
Xiaolin Wang
Multimedia Tools and Applications, 2018, 77 : 26901 - 26918
[6] Exploring hybrid spatio-temporal convolutional networks for human action recognition
Hao Wang
Yanhua Yang
Erkun Yang
Cheng Deng
Multimedia Tools and Applications, 2017, 76 : 15065 - 15081
[7] Exploring hybrid spatio-temporal convolutional networks for human action recognition
Wang, Hao
Yang, Yanhua
Yang, Erkun
Deng, Cheng
MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (13) : 15065 - 15081
[8] Human action recognition based on convolutional neural network and spatial pyramid representation
Xiao, Jihai
Cui, Xiaohong
Li, Feng
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2020, 71 (71)
[9] Stratified pooling based deep convolutional neural networks for human action recognition
Yu, Sheng
Cheng, Yun
Su, Songzhi
Cai, Guorong
Li, Shaozi
MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (11) : 13367 - 13382
[10] Human action recognition using Lie Group features and convolutional neural networks
Cai, Linqin
Liu, Chengpeng
Yuan, Rongdi
Ding, Heen
NONLINEAR DYNAMICS, 2020, 99 (04) : 3253 - 3263

← 1 2 3 4 5 →