A methodology for semantic action recognition based on pose and human-object interaction in avocado harvesting processes

被引:10
|
作者
Vasconez, J. P. [1 ]
Admoni, H. [2 ]
Auat Cheein, F. [3 ]
机构
[1] Escuela Politec Nacl, Artificial Intelligence & Comp Vis Res Lab, Quito 170517, Ecuador
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[3] Univ Tecn Federico Santa Maria, Dept Elect Engn, Valparaiso, Chile
关键词
Semantic human action recognition; Human-object interaction; Avocado harvesting process; Human?machine collaboration; AGRICULTURE; PRODUCTS;
D O I
10.1016/j.compag.2021.106057
中图分类号
S [农业科学];
学科分类号
09 ;
摘要
The agricultural industry could greatly benefit from an intelligent system capable of supporting field workers to increase production. Such a system would need to monitor human workers, their current actions, their intentions, and possible future actions, which are the focus of this work. Herein, we propose and validate a methodology to recognize human actions during the avocado harvesting process in a Chilean farm based on combined object-pose semantic information using RGB still images. We use Faster R-CNN ?Region Convolutional Neural Network? with Inception V2 convolutional object detection to recognize 17 categories, which include among others, field workers, tools, crops, and vehicles. Then, we use a convolutional-based 2D pose estimation method called OpenPose to detect 18 human skeleton joints. Both the object and the pose features are processed, normalized, and combined into a single feature vector. We test four classifiers ?Support vector machine, Decision trees, KNearest-Neighbour, and Bagged trees? on the combined object-pose feature vectors to evaluate action classification performance. We also test such results using principal component analysis on the four classifiers to reduce dimensionality. Accuracy and inference time are analyzed for all the classifiers using 10 action categories, related to the avocado harvesting process. The results show that it is possible to detect human actions during harvesting, obtaining average accuracy performances (among all action categories) ranging from 57% to 99%, depending on the classifier used. The latter can be used to support an intelligent system, such as robots, interacting with field workers aimed at increasing productivity.
引用
收藏
页数:12
相关论文
共 36 条
  • [31] An Improved Human-Object Interaction Detection Method Based on Short-term Memory Selection Network
    Wang, Chang
    Ma, Shiwei
    2020 INTERNATIONAL CONFERENCE ON IMAGE, VIDEO PROCESSING AND ARTIFICIAL INTELLIGENCE, 2020, 11584
  • [32] Egocentric visual scene description based on human-object interaction and deep spatial relations among objects
    Khan, Gulraiz
    Ghani, Muhammad Usman
    Siddiqi, Aiman
    Zahoor-ur-Rehman
    Seo, Sanghyun
    Baik, Sung Wook
    Mehmood, Irfan
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (23-24) : 15859 - 15880
  • [33] Toward a Unified Transformer-Based Framework for Scene Graph Generation and Human-Object Interaction Detection
    He, Tao
    Gao, Lianli
    Song, Jingkuan
    Li, Yuan-Fang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 6274 - 6288
  • [34] Egocentric visual scene description based on human-object interaction and deep spatial relations among objects
    Gulraiz Khan
    Muhammad Usman Ghani
    Aiman Siddiqi
    Sanghyun Zahoor-ur-Rehman
    Sung Wook Seo
    Irfan Baik
    Multimedia Tools and Applications, 2020, 79 : 15859 - 15880
  • [35] HOI-V: One-stage human-object interaction detection based on multi-feature fusion in videos
    Gu, Dongzhou
    Huang, Kaihua
    Ma, Shiwei
    Liu, Jiang
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2025, 130
  • [36] Exploring Spatio–Temporal Graph Convolution for Video-Based Human–Object Interaction Recognition
    Wang, Ning
    Zhu, Guangming
    Li, Hongsheng
    Feng, Mingtao
    Zhao, Xia
    Ni, Lan
    Shen, Peiyi
    Mei, Lin
    Zhang, Liang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (10) : 5814 - 5827