YOLO Series for Human Hand Action Detection and Classification from Egocentric Videos

被引:11
|
作者
Nguyen, Hung-Cuong [1 ]
Nguyen, Thi-Hao [1 ]
Scherer, Rafal [2 ]
Le, Van-Hung [3 ]
机构
[1] Hung Vuong Univ, Fac Engn Technol, Viet Tri City 35100, Vietnam
[2] Czestochowa Tech Univ, Dept Intelligent Comp Syst, PL-42218 Czestochowa, Poland
[3] Tan Trao Univ, Fac Basic Sci, Tuyen Quang City 22000, Vietnam
关键词
hand detection; hand classification; YOLO-family networks; convolutional neural networks (CNNs); egocentric vision;
D O I
10.3390/s23063255
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Hand detection and classification is a very important pre-processing step in building applications based on three-dimensional (3D) hand pose estimation and hand activity recognition. To automatically limit the hand data area on egocentric vision (EV) datasets, especially to see the development and performance of the "You Only Live Once" (YOLO) network over the past seven years, we propose a study comparing the efficiency of hand detection and classification based on the YOLO-family networks. This study is based on the following problems: (1) systematizing all architectures, advantages, and disadvantages of YOLO-family networks from version (v)1 to v7; (2) preparing ground-truth data for pre-trained models and evaluation models of hand detection and classification on EV datasets (FPHAB, HOI4D, RehabHand); (3) fine-tuning the hand detection and classification model based on the YOLO-family networks, hand detection, and classification evaluation on the EV datasets. Hand detection and classification results on the YOLOv7 network and its variations were the best across all three datasets. The results of the YOLOv7-w6 network are as follows: FPHAB is P = 97% with Thesh(IOU) = 0.5; HOI4D is P = 95% with Thesh(IOU) = 0.5; RehabHand is larger than 95% with Thesh(IOU) = 0.5; the processing speed of YOLOv7-w6 is 60 fps with a resolution of 1280 x 1280 pixels and that of YOLOv7 is 133 fps with a resolution of 640 x 640 pixels.
引用
收藏
页数:24
相关论文
共 4 条
  • [1] Assisting Group Activity Analysis through Hand Detection and Identification in Multiple Egocentric Videos
    Charoenkulvanich, Nathawan
    Kamikubo, Rie
    Yonetani, Ryo
    Sato, Yoichi
    PROCEEDINGS OF IUI 2019, 2019, : 570 - 574
  • [2] Egocentric Hand Track and Object-based Human Action Recognition
    Kapidis, Georgios
    Poppe, Ronald
    van Dam, Elsbeth
    Noldus, Lucas P. J. J.
    Veltkamp, Remco C.
    2019 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI 2019), 2019, : 922 - 929
  • [3] A Hybrid Approach to Hand Detection and Type Classification in Upper-Body Videos
    Papadimitriou, Katerina
    Potamianos, Gerasimos
    PROCEEDINGS OF THE 2018 7TH EUROPEAN WORKSHOP ON VISUAL INFORMATION PROCESSING (EUVIP), 2018,
  • [4] Human functional pattern recognition with focus on hand detection in egocentric vision through an artificial intelligence approach
    Yemisi Babatope, Eyitomilayo
    Lopez-Rodriguez, Mario
    Alejandro Acosta-Franco, Jesus
    Sarai Garcia-Vazquez, Mireya
    Ramirez-Acosta, Alejandro Alvaro
    OPTICS AND PHOTONICS FOR INFORMATION PROCESSING XVI, 2022, 12225