A hybrid network for estimating 3D interacting hand pose from a single RGB image

被引:0
作者
Bao, Wenxia [1 ]
Gao, Qiuyue [1 ]
Yang, Xianjun [2 ]
机构
[1] Anhui Univ, Sch Elect & Informat Engn, Hefei 230601, Anhui, Peoples R China
[2] Chinese Acad Sci, Hefei Inst Phys Sci, Hefei 230031, Anhui, Peoples R China
关键词
3D hand pose estimation; Interacting Hand; Hybrid network; End to end network; TEXT; RECOGNITION; KHATT;
D O I
10.1007/s11760-024-03043-1
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The estimation of 3D interacting hand pose from a single RGB image is a challenging problem. The hands tend to occlude each other and are self-similar in two-handed interactions. In this study, a simple, accurate end-to-end framework called HybridPoseNet is proposed for estimating 3D interactive hand pose. The hybrid network employs an encoder-decoder architecture. More specifically, the feature encoder is a hybrid structure that combines a convolutional neural network (CNN) with a transformer to accomplish the feature encoding of hand information. An ordinary CNN is employed to extract the local detailed features of a given image, and a vision transformer is used to capture the long-distance spatial interactions between the cross-positional feature vectors. Moreover, the 3D pose decoder is based on left and right network branches, which are fused via a feature enhancement module (FEM). The FEM helps reduce the ambiguity in appearance caused by the self-similarity of the hands. The decoder elevates the 2D pose to the 3D pose by estimating two depth components. The ablation experiments demonstrate the effectiveness of each module in the network. In addition, comprehensive experiments on the InterHand2.6M dataset show that the proposed method outperforms previous state-of-the-art methods for estimating interactive hand pose.
引用
收藏
页码:3801 / 3814
页数:14
相关论文
共 50 条
  • [41] Face It: 3D Facial Reconstruction from a Single 2D Image for Games and Simulations
    Kirtzic, J. Steven
    Daescu, Ovidiu
    2011 INTERNATIONAL CONFERENCE ON CYBERWORLDS, 2011, : 244 - 248
  • [42] LEARNING-BASED FULLY 3D FACE RECONSTRUCTION FROM A SINGLE IMAGE
    Hu, Xiaoping
    Wang, Ying
    Zhu, Feiyun
    Pan, Chunhong
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 1651 - 1655
  • [43] A review on object pose recovery: From 3D bounding box detectors to full 6D pose estimators
    Sahin, Caner
    Garcia-Hernando, Guillermo
    Sock, Juil
    Kim, Tae-Kyun
    IMAGE AND VISION COMPUTING, 2020, 96
  • [44] Context-Aware 3D Object Detection From a Single Image in Autonomous Driving
    Zhou, Dingfu
    Song, Xibin
    Fang, Jin
    Dai, Yuchao
    Li, Hongdong
    Zhang, Liangjun
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (10) : 18568 - 18580
  • [45] Synthetic Training Image Dataset for Vision-Based 3D Pose Estimation of Construction Workers
    Kim, Jinwoo
    Kim, Daeho
    Shah, Julianne
    Lee, SangHyun
    CONSTRUCTION RESEARCH CONGRESS 2022: COMPUTER APPLICATIONS, AUTOMATION, AND DATA ANALYTICS, 2022, : 254 - 262
  • [46] Voting and Attention-Based Pose Relation Learning for Object Pose Estimation From 3D Point Clouds
    Hoang, Dinh-Cuong
    Stork, Johannes A.
    Stoyanov, Todor
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (04) : 8980 - 8987
  • [47] Efficient representation and feature extraction for neural network-based 3D object pose estimation
    Kouskouridas, Rigas
    Gasteratos, Antonios
    Emmanouilidis, Christos
    NEUROCOMPUTING, 2013, 120 : 90 - 100
  • [48] Recent Advances in 3D Human Pose Estimation: From Optimization to Implementation and Beyond
    Yan, Jielu
    Zhou, Mingliang
    Pan, Jinli
    Yin, Meng
    Fang, Bin
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2022, 36 (02)
  • [49] Single-view-based 3D facial reconstruction method robust against pose variations
    Jo, Jaeik
    Choi, Heeseung
    Kim, Ig-Jae
    Kim, Jaihie
    PATTERN RECOGNITION, 2015, 48 (01) : 73 - 85
  • [50] 3D facial expression modeling based on facial landmarks in single image
    Lv, Chenlei
    Wu, Zhongke
    Wang, Xingce
    Zhou, Mingquan
    NEUROCOMPUTING, 2019, 355 : 155 - 167