Evaluating the Performance of Mobile-Convolutional Neural Networks for Spatial and Temporal Human Action Recognition Analysis

被引:4
|
作者
Moutsis, Stavros N. [1 ]
Tsintotas, Konstantinos A. [1 ]
Kansizoglou, Ioannis [1 ]
Gasteratos, Antonios [1 ]
机构
[1] Democritus Univ Thrace, Dept Prod & Management Engn, 12 Vas Sophias, GR-67132 Xanthi, Greece
关键词
human action recognition; mobile-CNNs; spatial analysis; RNNs; temporal analysis; PLACE RECOGNITION; GOING DEEPER;
D O I
10.3390/robotics12060167
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Human action recognition is a computer vision task that identifies how a person or a group acts on a video sequence. Various methods that rely on deep-learning techniques, such as two- or three-dimensional convolutional neural networks (2D-CNNs, 3D-CNNs), recurrent neural networks (RNNs), and vision transformers (ViT), have been proposed to address this problem over the years. Motivated by the fact that most of the used CNNs in human action recognition present high complexity, and the necessity of implementations on mobile platforms that are characterized by restricted computational resources, in this article, we conduct an extensive evaluation protocol over the performance metrics of five lightweight architectures. In particular, we examine how these mobile-oriented CNNs (viz., ShuffleNet-v2, EfficientNet-b0, MobileNet-v3, and GhostNet) execute in spatial analysis compared to a recent tiny ViT, namely EVA-02-Ti, and a higher computational model, ResNet-50. Our models, previously trained on ImageNet and BU101, are measured for their classification accuracy on HMDB51, UCF101, and six classes of the NTU dataset. The average and max scores, as well as the voting approaches, are generated through three and fifteen RGB frames of each video, while two different rates for the dropout layers were assessed during the training. Last, a temporal analysis via multiple types of RNNs that employ features extracted by the trained networks is examined. Our results reveal that EfficientNet-b0 and EVA-02-Ti surpass the other mobile-CNNs, achieving comparable or superior performance to ResNet-50.
引用
收藏
页数:30
相关论文
共 50 条
  • [1] Action Recognition Based on Spatial Temporal Graph Convolutional Networks
    Zheng, Wanqiang
    Jing, Punan
    Xu, Qingyang
    PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND APPLICATION ENGINEERING (CSAE2019), 2019,
  • [2] Human Action Recognition based on Convolutional Neural Networks with a Convolutional Auto-Encoder
    Geng, Chi
    Song, JianXin
    PROCEEDINGS OF THE 2015 5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCES AND AUTOMATION ENGINEERING, 2016, 42 : 933 - 938
  • [3] Human action recognition based on quaternion spatial-temporal convolutional neural network and LSTM in RGB videos
    Meng, Bo
    Liu, XueJun
    Wang, Xiaolin
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (20) : 26901 - 26918
  • [4] Human action recognition using genetic algorithms and convolutional neural networks
    Ijjina, Earnest Paul
    Chalavadi, Krishna Mohan
    PATTERN RECOGNITION, 2016, 59 : 199 - 212
  • [5] Human action recognition based on quaternion spatial-temporal convolutional neural network and LSTM in RGB videos
    Bo Meng
    XueJun Liu
    Xiaolin Wang
    Multimedia Tools and Applications, 2018, 77 : 26901 - 26918
  • [6] Exploring hybrid spatio-temporal convolutional networks for human action recognition
    Hao Wang
    Yanhua Yang
    Erkun Yang
    Cheng Deng
    Multimedia Tools and Applications, 2017, 76 : 15065 - 15081
  • [7] Exploring hybrid spatio-temporal convolutional networks for human action recognition
    Wang, Hao
    Yang, Yanhua
    Yang, Erkun
    Deng, Cheng
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (13) : 15065 - 15081
  • [8] Human action recognition based on convolutional neural network and spatial pyramid representation
    Xiao, Jihai
    Cui, Xiaohong
    Li, Feng
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2020, 71 (71)
  • [9] Stratified pooling based deep convolutional neural networks for human action recognition
    Yu, Sheng
    Cheng, Yun
    Su, Songzhi
    Cai, Guorong
    Li, Shaozi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (11) : 13367 - 13382
  • [10] Human action recognition using Lie Group features and convolutional neural networks
    Cai, Linqin
    Liu, Chengpeng
    Yuan, Rongdi
    Ding, Heen
    NONLINEAR DYNAMICS, 2020, 99 (04) : 3253 - 3263