Evaluating the Performance of Mobile-Convolutional Neural Networks for Spatial and Temporal Human Action Recognition Analysis

被引:5
作者
Moutsis, Stavros N. [1 ]
Tsintotas, Konstantinos A. [1 ]
Kansizoglou, Ioannis [1 ]
Gasteratos, Antonios [1 ]
机构
[1] Democritus Univ Thrace, Dept Prod & Management Engn, 12 Vas Sophias, GR-67132 Xanthi, Greece
关键词
human action recognition; mobile-CNNs; spatial analysis; RNNs; temporal analysis; PLACE RECOGNITION; GOING DEEPER;
D O I
10.3390/robotics12060167
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Human action recognition is a computer vision task that identifies how a person or a group acts on a video sequence. Various methods that rely on deep-learning techniques, such as two- or three-dimensional convolutional neural networks (2D-CNNs, 3D-CNNs), recurrent neural networks (RNNs), and vision transformers (ViT), have been proposed to address this problem over the years. Motivated by the fact that most of the used CNNs in human action recognition present high complexity, and the necessity of implementations on mobile platforms that are characterized by restricted computational resources, in this article, we conduct an extensive evaluation protocol over the performance metrics of five lightweight architectures. In particular, we examine how these mobile-oriented CNNs (viz., ShuffleNet-v2, EfficientNet-b0, MobileNet-v3, and GhostNet) execute in spatial analysis compared to a recent tiny ViT, namely EVA-02-Ti, and a higher computational model, ResNet-50. Our models, previously trained on ImageNet and BU101, are measured for their classification accuracy on HMDB51, UCF101, and six classes of the NTU dataset. The average and max scores, as well as the voting approaches, are generated through three and fifteen RGB frames of each video, while two different rates for the dropout layers were assessed during the training. Last, a temporal analysis via multiple types of RNNs that employ features extracted by the trained networks is examined. Our results reveal that EfficientNet-b0 and EVA-02-Ti surpass the other mobile-CNNs, achieving comparable or superior performance to ResNet-50.
引用
收藏
页数:30
相关论文
共 50 条
[21]   Spatial-Temporal Self-Attention Enhanced Graph Convolutional Networks for Fitness Yoga Action Recognition [J].
Wei, Guixiang ;
Zhou, Huijian ;
Zhang, Liping ;
Wang, Jianji .
SENSORS, 2023, 23 (10)
[22]   Research on Human Action Recognition Based on Convolutional Neural Network [J].
Wang, Peng ;
Yang, Yuliang ;
Li, Wanchong ;
Zhang, Linhao ;
Wang, Mengyuan ;
Zhang, Xiaobo ;
Zhu, Mengyu .
2019 28TH WIRELESS AND OPTICAL COMMUNICATIONS CONFERENCE (WOCC), 2019, :28-32
[23]   Attention Mechanism Based on Improved Spatial-Temporal Convolutional Neural Networks for Traffic Police Gesture Recognition [J].
Wu, Zhixuan ;
Ma, Nan ;
Gao, Yue ;
Li, Jiahong ;
Xu, Xinkai ;
Yao, Yongqiang ;
Chen, Li .
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2022, 36 (08)
[24]   Human Action Recognition Using Deep Neural Networks [J].
Koli, Rashmi R. ;
Bagban, Tanveer, I .
PROCEEDINGS OF THE 2020 FOURTH WORLD CONFERENCE ON SMART TRENDS IN SYSTEMS, SECURITY AND SUSTAINABILITY (WORLDS4 2020), 2020, :376-380
[25]   Human Action Recognition using Spatial-Temporal Analysis and Bag of Visual Words [J].
Naidoo, Denver ;
Tapamo, Jules-Raymond ;
Walingo, Tom .
2018 14TH INTERNATIONAL CONFERENCE ON SIGNAL IMAGE TECHNOLOGY & INTERNET BASED SYSTEMS (SITIS), 2018, :697-702
[26]   Video spatiotemporal mapping for human action recognition by convolutional neural network [J].
Zare, Amin ;
Abrishami Moghaddam, Hamid ;
Sharifi, Arash .
PATTERN ANALYSIS AND APPLICATIONS, 2020, 23 (01) :265-279
[27]   Edge and Node Graph Convolutional Neural Network for Human Action Recognition [J].
Li, Gang ;
Yang, Shengjie ;
Li, Jianxun .
PROCEEDINGS OF THE 32ND 2020 CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2020), 2020, :4630-4635
[28]   Video spatiotemporal mapping for human action recognition by convolutional neural network [J].
Amin Zare ;
Hamid Abrishami Moghaddam ;
Arash Sharifi .
Pattern Analysis and Applications, 2020, 23 :265-279
[29]   Two-Stream Adaptive Weight Convolutional Neural Network Based on Spatial Attention for Human Action Recognition [J].
Chen, Guanzhou ;
Yao, Lu ;
Xu, Jingting ;
Liu, Qianxi ;
Chen, Shengyong .
INTELLIGENT ROBOTICS AND APPLICATIONS (ICIRA 2022), PT IV, 2022, 13458 :319-330
[30]   Deep Residual Split Directed Graph Convolutional Neural Networks for Action Recognition [J].
Fu, Bo ;
Fu, Shilin ;
Wang, Liyan ;
Dong, Yuhan ;
Ren, Yonggong .
IEEE MULTIMEDIA, 2020, 27 (04) :9-17