MPCTrans: Multi-Perspective Cue-Aware Joint Relationship Representation for 3D Hand Pose Estimation via Swin Transformer

被引:1
作者
Wan, Xiangan [1 ]
Ju, Jianping [1 ]
Tang, Jianying [1 ]
Lin, Mingyu [1 ]
Rao, Ning [1 ]
Chen, Deng [2 ]
Liu, Tingting [1 ]
Li, Jing [1 ]
Bian, Fan [1 ]
Xiong, Nicholas [1 ]
机构
[1] Hubei Business Coll, Sch Comp Sci & Technol, Wuhan 430079, Peoples R China
[2] Wuhan Inst Technol, Hubei Prov Key Lab Intelligent Robot, Wuhan 430079, Peoples R China
基金
中国国家自然科学基金;
关键词
depth image; 3D hand pose estimation; multi-perspective cues; Swin Transformer; deep learning; REGRESSION; NETWORK;
D O I
10.3390/s24217029
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
The objective of 3D hand pose estimation (HPE) based on depth images is to accurately locate and predict keypoints of the hand. However, this task remains challenging because of the variations in hand appearance from different viewpoints and severe occlusions. To effectively address these challenges, this study introduces a novel approach, called the multi-perspective cue-aware joint relationship representation for 3D HPE via the Swin Transformer (MPCTrans, for short). This approach is designed to learn multi-perspective cues and essential information from hand depth images. To achieve this goal, three novel modules are proposed to utilize features from multiple virtual views of the hand, namely, the adaptive virtual multi-viewpoint (AVM), hierarchy feature estimation (HFE), and virtual viewpoint evaluation (VVE) modules. The AVM module adaptively adjusts the angles of the virtual viewpoint and learns the ideal virtual viewpoint to generate informative multiple virtual views. The HFE module estimates hand keypoints through hierarchical feature extraction. The VVE module evaluates virtual viewpoints by using chained high-level functions from the HFE module. Transformer is used as a backbone to extract the long-range semantic joint relationships in hand depth images. Extensive experiments demonstrate that the MPCTrans model achieves state-of-the-art performance on four challenging benchmark datasets.
引用
收藏
页数:17
相关论文
共 20 条
  • [1] 3D hand pose and mesh estimation via a generic Topology-aware Transformer model
    Yu, Shaoqi
    Wang, Yintong
    Chen, Lili
    Zhang, Xiaolin
    Li, Jiamao
    FRONTIERS IN NEUROROBOTICS, 2024, 18
  • [2] HandDAGT: A Denoising Adaptive Graph Transformer for 3D Hand Pose Estimation
    Cheng, Wencan
    Kim, Eunji
    Ko, Jong Hwan
    COMPUTER VISION - ECCV 2024, PT LXXXVIII, 2025, 15146 : 35 - 52
  • [3] Orientation Cues-Aware Facial Relationship Representation for Head Pose Estimation via Transformer
    Liu, Hai
    Zhang, Cheng
    Deng, Yongjian
    Liu, Tingting
    Zhang, Zhaoli
    Li, You-Fu
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 6289 - 6302
  • [4] 3D Hand Pose Estimation via Graph-Based Reasoning
    Song, Jae-Hun
    Kang, Suk-Ju
    IEEE ACCESS, 2021, 9 : 35824 - 35833
  • [5] 3D human pose estimation with multi-hypotheses gated transformer
    Dong, Xiena
    Zhang, Jian
    Yu, Jun
    Yu, Ting
    MULTIMEDIA SYSTEMS, 2024, 30 (06)
  • [6] Refining Weights for Enhanced Object Similarity in Multi-perspective 6Dof Pose Estimation and 3D Object Detection
    Kusumo, Budiarianto Suryo
    Thomas, Ulrike
    DEEP LEARNING THEORY AND APPLICATIONS, PT I, DELTA 2024, 2024, 2171 : 310 - 327
  • [7] MTMVC: Semi-supervised 3D hand pose estimation using multi-task and multi-view consistency
    Xiang, Donghai
    Xu, Wei
    Zhang, Yuting
    Peng, Bei
    Wang, Guotai
    Li, Kang
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 95
  • [8] 3D Capsule Hand Pose Estimation Network Based on Structural Relationship Information
    Wu, Yiqi
    Ma, Shichao
    Zhang, Dejun
    Sun, Jun
    SYMMETRY-BASEL, 2020, 12 (10): : 1 - 14
  • [9] LPPM-Net: Local-aware point processing module based 3D hand pose estimation for point cloud
    Yang, Jian
    Ma, Xiaohong
    Sun, Yi
    Lin, Xiangbo
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2021, 90
  • [10] Joint multi-scale transformers and pose equivalence constraints for 3D human pose estimation
    Wu, Yongpeng
    Kong, Dehui
    Gao, Junna
    Li, Jinghua
    Yin, Baocai
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 103