Skeleton-Based Action Recognition with Joint Coordinates as Feature Using Neural Oblivious Decision Ensembles

被引:0
作者
Nasrul'Alam, Fakhrul Aniq Hakimi [1 ]
Shapiai, Mohd Ibrahim [1 ]
Batool, Uzma [2 ]
Ramli, Ahmad Kamal [3 ]
Elias, Khairil Ashraf [4 ]
机构
[1] Univ Teknol Malaysia, Ctr Artificial Intelligence & Robot iKohza, Malaysia Japan Int Inst Technol, Kuala Lumpur, Malaysia
[2] Univ Wah, Dept Comp Sci, Wah Cantt, Pakistan
[3] Univ Teknol MARA Pahang, Fac Comp & Math Sci, Bandar Tun Razak, Malaysia
[4] Selangor Int Islamic Univ Coll, Fac Informat Sci & Technol, Kajang, Malaysia
来源
NEW TRENDS IN INTELLIGENT SOFTWARE METHODOLOGIES, TOOLS AND TECHNIQUES | 2021年 / 337卷
关键词
behavior recognition; pose estimation; skeleton; joint coordinates; structured data; Neural Oblivious Decision Ensemble (NODE);
D O I
10.3233/FAIA210037
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recognition of human behavior is critical in video monitoring, humancomputer interaction, video comprehension, and virtual reality. The key problem with behaviour recognition in video surveillance is the high degree of variation between and within subjects. Numerous studies have suggested background-insensitive skeleton-based as the proven detection technique. The present state-of-the-art approaches to skeleton-based action recognition rely primarily on Recurrent Neural Networks (RNN) and Convolution Neural Networks (CNN). Both methods take dynamic human skeleton as the input to the network. We chose to handle skeleton data differently, relying solely on its skeleton joint coordinates as the input. The skeleton joints' positions are defined in (x, y) coordinates. In this paper, we investigated the incorporation of the Neural Oblivious Decision Ensemble (NODE) into our proposed action classifier network. The skeleton is extracted using a pose estimation technique based on the Residual Network (ResNet). It extracts the 2D skeleton of 18 joints for each detected body. The joint coordinates of the skeleton are stored in a table in the form of rows and columns. Each row represents the position of the joints. The structured data are fed into NODE for label prediction. With the proposed network, we obtain 97.5% accuracy on RealWorld (HAR) dataset. Experimental results show that the proposed network outperforms one the state-of-the-art approaches by 1.3%. In conclusion, NODE is a promising deep learning technique for structured data analysis as compared to its machine learning counterparts such as the GBDT packages; Catboost, and XGBoost.
引用
收藏
页码:380 / 392
页数:13
相关论文
共 27 条
  • [1] Bao J., 2016, 2016 IEEE 13 INT C S
  • [2] Survey of Pedestrian Action Recognition Techniques for Autonomous Driving
    Chen, Li
    Ma, Nan
    Wang, Patrick
    Li, Jiahong
    Wang, Pengfei
    Pang, Guilin
    Shi, Xiaojun
    [J]. TSINGHUA SCIENCE AND TECHNOLOGY, 2020, 25 (04) : 458 - 470
  • [3] Chen T., 2016, arXiv
  • [4] Chou ED, 2018, Arxiv, DOI arXiv:1811.09950
  • [5] Du Y, 2015, PROCEEDINGS 3RD IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION ACPR 2015, P579, DOI 10.1109/ACPR.2015.7486569
  • [6] Du Y, 2015, PROC CVPR IEEE, P1110, DOI 10.1109/CVPR.2015.7298714
  • [7] Feng J, 2018, Arxiv, DOI arXiv:1806.00007
  • [8] Greedy function approximation: A gradient boosting machine
    Friedman, JH
    [J]. ANNALS OF STATISTICS, 2001, 29 (05) : 1189 - 1232
  • [9] Grabczewski K, 2005, HIS 2005: 5TH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS, PROCEEDINGS, P212
  • [10] Ho TK, 1998, IEEE T PATTERN ANAL, V20, P832, DOI 10.1109/34.709601