Evaluating the Performance of Mobile-Convolutional Neural Networks for Spatial and Temporal Human Action Recognition Analysis

被引:5
作者
Moutsis, Stavros N. [1 ]
Tsintotas, Konstantinos A. [1 ]
Kansizoglou, Ioannis [1 ]
Gasteratos, Antonios [1 ]
机构
[1] Democritus Univ Thrace, Dept Prod & Management Engn, 12 Vas Sophias, GR-67132 Xanthi, Greece
关键词
human action recognition; mobile-CNNs; spatial analysis; RNNs; temporal analysis; PLACE RECOGNITION; GOING DEEPER;
D O I
10.3390/robotics12060167
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Human action recognition is a computer vision task that identifies how a person or a group acts on a video sequence. Various methods that rely on deep-learning techniques, such as two- or three-dimensional convolutional neural networks (2D-CNNs, 3D-CNNs), recurrent neural networks (RNNs), and vision transformers (ViT), have been proposed to address this problem over the years. Motivated by the fact that most of the used CNNs in human action recognition present high complexity, and the necessity of implementations on mobile platforms that are characterized by restricted computational resources, in this article, we conduct an extensive evaluation protocol over the performance metrics of five lightweight architectures. In particular, we examine how these mobile-oriented CNNs (viz., ShuffleNet-v2, EfficientNet-b0, MobileNet-v3, and GhostNet) execute in spatial analysis compared to a recent tiny ViT, namely EVA-02-Ti, and a higher computational model, ResNet-50. Our models, previously trained on ImageNet and BU101, are measured for their classification accuracy on HMDB51, UCF101, and six classes of the NTU dataset. The average and max scores, as well as the voting approaches, are generated through three and fifteen RGB frames of each video, while two different rates for the dropout layers were assessed during the training. Last, a temporal analysis via multiple types of RNNs that employ features extracted by the trained networks is examined. Our results reveal that EfficientNet-b0 and EVA-02-Ti surpass the other mobile-CNNs, achieving comparable or superior performance to ResNet-50.
引用
收藏
页数:30
相关论文
共 50 条
[41]   Learning Action Images Using Deep Convolutional Neural Networks For 3D Action Recognition [J].
Thien Huynh-The ;
Hua, Cam-Hao ;
Kim, Dong-Seong .
2019 IEEE SENSORS APPLICATIONS SYMPOSIUM (SAS), 2019,
[42]   Human action recognition using an image-based temporal and spatial representation [J].
Silva, Vinicius ;
Soares, Filomena ;
Esteves, Joao Sena ;
Vercelli, Gianni .
2020 12TH INTERNATIONAL CONGRESS ON ULTRA MODERN TELECOMMUNICATIONS AND CONTROL SYSTEMS AND WORKSHOPS (ICUMT 2020), 2020, :41-46
[43]   Human Action Recognition Using Hybrid Deep Evolving Neural Networks [J].
Dasari, Pavan ;
Zhang, Li ;
Yu, Yonghong ;
Huang, Haoqian ;
Gao, Rong .
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[44]   Dynamic Edge Convolutional Neural Network for Skeleton-Based Human Action Recognition [J].
Tasnim, Nusrat ;
Baek, Joong-Hwan .
SENSORS, 2023, 23 (02)
[45]   Spatio-Temporal Features based Human Action Recognition using Convolutional Long Short-Term Deep Neural Network [J].
Saif, A. F. M. Saifuddin ;
Wollega, Ebisa D. ;
Kalevela, Sylvester A. .
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (05) :1-15
[46]   A comparative review of graph convolutional networks for human skeleton-based action recognition [J].
Liqi Feng ;
Yaqin Zhao ;
Wenxuan Zhao ;
Jiaxi Tang .
Artificial Intelligence Review, 2022, 55 :4275-4305
[47]   A comparative review of graph convolutional networks for human skeleton-based action recognition [J].
Feng, Liqi ;
Zhao, Yaqin ;
Zhao, Wenxuan ;
Tang, Jiaxi .
ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (05) :4275-4305
[48]   Human action recognition model incorporating multiscale temporal convolutional network and spatiotemporal excitation network [J].
Qi, Yincheng ;
Wang, Baoli ;
Shi, Boqiang ;
Zhang, Ke .
JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (03)
[49]   Adaptive recognition method of human skeleton action with spatial-temporal tensor fusion [J].
Jian Z. ;
Nan J. ;
Liu X. ;
Dai W. .
Yi Qi Yi Biao Xue Bao/Chinese Journal of Scientific Instrument, 2023, 44 (06) :74-85
[50]   Human action recognition based on spatial-temporal descriptors using key poses [J].
Hu, Shuo ;
Chen, Yuxin ;
Wang, Huaibao ;
Zuo, Yaqing .
INTERNATIONAL SYMPOSIUM ON OPTOELECTRONIC TECHNOLOGY AND APPLICATION 2014: IMAGE PROCESSING AND PATTERN RECOGNITION, 2014, 9301