A hybrid network for estimating 3D interacting hand pose from a single RGB image

被引:0
作者
Bao, Wenxia [1 ]
Gao, Qiuyue [1 ]
Yang, Xianjun [2 ]
机构
[1] Anhui Univ, Sch Elect & Informat Engn, Hefei 230601, Anhui, Peoples R China
[2] Chinese Acad Sci, Hefei Inst Phys Sci, Hefei 230031, Anhui, Peoples R China
关键词
3D hand pose estimation; Interacting Hand; Hybrid network; End to end network; TEXT; RECOGNITION; KHATT;
D O I
10.1007/s11760-024-03043-1
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The estimation of 3D interacting hand pose from a single RGB image is a challenging problem. The hands tend to occlude each other and are self-similar in two-handed interactions. In this study, a simple, accurate end-to-end framework called HybridPoseNet is proposed for estimating 3D interactive hand pose. The hybrid network employs an encoder-decoder architecture. More specifically, the feature encoder is a hybrid structure that combines a convolutional neural network (CNN) with a transformer to accomplish the feature encoding of hand information. An ordinary CNN is employed to extract the local detailed features of a given image, and a vision transformer is used to capture the long-distance spatial interactions between the cross-positional feature vectors. Moreover, the 3D pose decoder is based on left and right network branches, which are fused via a feature enhancement module (FEM). The FEM helps reduce the ambiguity in appearance caused by the self-similarity of the hands. The decoder elevates the 2D pose to the 3D pose by estimating two depth components. The ablation experiments demonstrate the effectiveness of each module in the network. In addition, comprehensive experiments on the InterHand2.6M dataset show that the proposed method outperforms previous state-of-the-art methods for estimating interactive hand pose.
引用
收藏
页码:3801 / 3814
页数:14
相关论文
共 50 条
  • [31] 3D Single Person Pose Estimation Method Based on Deep Learning
    Yuan, Xinrui
    Wang, Hairong
    Wang, Jun
    FUZZY SYSTEMS AND DATA MINING VI, 2020, 331 : 481 - 491
  • [32] 3D pose estimation of ground rigid target based on ladar range image
    Lv, Dan
    Sun, Jian-Feng
    Li, Qi
    Wang, Qi
    APPLIED OPTICS, 2013, 52 (33) : 8073 - 8081
  • [33] Hypergraph regularized autoencoder for image-based 3D human pose recovery
    Hong, Chaoqun
    Chen, Xuhui
    Wang, Xiaodong
    Tang, Chaohui
    SIGNAL PROCESSING, 2016, 124 : 132 - 140
  • [34] Pyramid Deep Fusion Network for Two-Hand Reconstruction From RGB-D Images
    Ren, Jinwei
    Zhu, Jianke
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 5843 - 5855
  • [35] Semantic part segmentation method based 3D object pose estimation with RGB-D images for bin-picking
    Zhuang, Chungang
    Wang, Zhe
    Zhao, Heng
    Ding, Han
    ROBOTICS AND COMPUTER-INTEGRATED MANUFACTURING, 2021, 68
  • [36] Holistic 3D face and head reconstruction with geometric details from a single image
    Lee, Jungwoo
    Lumentut, Jonathan Samuel
    Park, In Kyu
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (26) : 38217 - 38233
  • [37] 3D pose estimation and localization of construction equipment from single camera images by virtual model integration
    Kim, Junghoon
    Chi, Seokho
    Kim, Jinwoo
    ADVANCED ENGINEERING INFORMATICS, 2023, 57
  • [38] A 3D Iris Scanner From a Single Image Using Convolutional Neural Networks
    Benalcazar, Daniel P.
    Zambrano, Jorge E.
    Bastias, Diego
    Perez, Claudio A.
    Bowyer, Kevin W.
    IEEE ACCESS, 2020, 8 : 98584 - 98599
  • [39] 3D Point Cloud Reconstruction from a Single 4D Light Field Image
    Farhood, Helia
    Perry, Stuart
    Cheng, Eva
    Kim, Juno
    OPTICS, PHOTONICS AND DIGITAL TECHNOLOGIES FOR IMAGING APPLICATIONS VI, 2021, 11353
  • [40] 3D Face Reconstruction From Single 2D Image Using Distinctive Features
    Afzal, H. M. Rehan
    Luo, Suhuai
    Afzal, M. Kamran
    Chaudhary, Gopal
    Khari, Manju
    Kumar, Sathish A. P.
    IEEE ACCESS, 2020, 8 (08): : 180681 - 180689