Evaluating the Performance of Mobile-Convolutional Neural Networks for Spatial and Temporal Human Action Recognition Analysis

被引：4

作者：

Moutsis, Stavros N. ^{[1
]}

Tsintotas, Konstantinos A. ^{[1
]}

Kansizoglou, Ioannis ^{[1
]}

Gasteratos, Antonios ^{[1
]}

机构：

[1] Democritus Univ Thrace, Dept Prod & Management Engn, 12 Vas Sophias, GR-67132 Xanthi, Greece

来源：

ROBOTICS | 2023年 / 12卷 / 06期

关键词：

human action recognition; mobile-CNNs; spatial analysis; RNNs; temporal analysis; PLACE RECOGNITION; GOING DEEPER;

D O I：

10.3390/robotics12060167

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Human action recognition is a computer vision task that identifies how a person or a group acts on a video sequence. Various methods that rely on deep-learning techniques, such as two- or three-dimensional convolutional neural networks (2D-CNNs, 3D-CNNs), recurrent neural networks (RNNs), and vision transformers (ViT), have been proposed to address this problem over the years. Motivated by the fact that most of the used CNNs in human action recognition present high complexity, and the necessity of implementations on mobile platforms that are characterized by restricted computational resources, in this article, we conduct an extensive evaluation protocol over the performance metrics of five lightweight architectures. In particular, we examine how these mobile-oriented CNNs (viz., ShuffleNet-v2, EfficientNet-b0, MobileNet-v3, and GhostNet) execute in spatial analysis compared to a recent tiny ViT, namely EVA-02-Ti, and a higher computational model, ResNet-50. Our models, previously trained on ImageNet and BU101, are measured for their classification accuracy on HMDB51, UCF101, and six classes of the NTU dataset. The average and max scores, as well as the voting approaches, are generated through three and fifteen RGB frames of each video, while two different rates for the dropout layers were assessed during the training. Last, a temporal analysis via multiple types of RNNs that employ features extracted by the trained networks is examined. Our results reveal that EfficientNet-b0 and EVA-02-Ti surpass the other mobile-CNNs, achieving comparable or superior performance to ResNet-50.

引用

页数：30

共 50 条

[31] Pixel Convolutional Networks for Skeleton-Based Human Action Recognition [J].

Change, Zhichao ;

Wang, Jiangyun ;

Han, Liang .

METHODS AND APPLICATIONS FOR MODELING AND SIMULATION OF COMPLEX SYSTEMS, 2018, 946 :513-523

[32] An evolving ensemble model of multi-stream convolutional neural networks for human action recognition in still images [J].

Sam Slade ;

Li Zhang ;

Yonghong Yu ;

Chee Peng Lim .

Neural Computing and Applications, 2022, 34 :9205-9231

[33] An evolving ensemble model of multi-stream convolutional neural networks for human action recognition in still images [J].

Slade, Sam ;

Zhang, Li ;

Yu, Yonghong ;

Lim, Chee Peng .

NEURAL COMPUTING & APPLICATIONS, 2022, 34 (11) :9205-9231

[34] Fine-Tuned Temporal Dense Sampling with 1D Convolutional Neural Network for Human Action Recognition [J].

Lim, Kian Ming ;

Lee, Chin Poo ;

Tan, Kok Seang ;

Alqahtani, Ali ;

Ali, Mohammed .

SENSORS, 2023, 23 (11)

[35] DEPTH HUMAN ACTION RECOGNITION BASED ON CONVOLUTION NEURAL NETWORKS AND PRINCIPAL COMPONENT ANALYSIS [J].

Manh-Quan Bui ;

Viet-Hang Duong ;

Tai, Tzu-Chiang ;

Wang, Jia-Ching .

2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, :1543-1547

[36] Improved Spatio-Temporal Convolutional Neural Networks for Traffic Police Gestures Recognition [J].

Wu, Zhixuan ;

Ma, Nan ;

Cheung, Yiu-ming ;

Li, Jiahong ;

He, Qin ;

Yao, Yongqiang ;

Zhang, Guoping .

2020 16TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS 2020), 2020, :109-115

[37] A 2D Convolutional Neural Network Approach for Human Action Recognition [J].

Toudjeu, Ignace Tchangou ;

Tapamo, Jules-Raymond .

2019 IEEE AFRICON, 2019,

[38] HIERARCHICAL DROPPED CONVOLUTIONAL NEURAL NETWORK FOR SPEED INSENSITIVE HUMAN ACTION RECOGNITION [J].

Meng, Fanyang ;

Liu, Hong ;

Liang, Yongsheng ;

Liu, Mengyuan ;

Liu, Wei .

2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2018,

[39] Human Action Recognition Using Convolutional Neural Network and Depth Sensor Data [J].

Ahmad, Zeeshan ;

Illanko, Kandasamy ;

Khan, Naimul ;

Androutsos, Dimitri .

2019 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND COMPUTER COMMUNICATIONS (ITCC 2019), 2019, :1-5

[40] Learning Action Images Using Deep Convolutional Neural Networks For 3D Action Recognition [J].

Thien Huynh-The ;

Hua, Cam-Hao ;

Kim, Dong-Seong .

2019 IEEE SENSORS APPLICATIONS SYMPOSIUM (SAS), 2019,

← 1 2 3 4 5 →