MPCTrans: Multi-Perspective Cue-Aware Joint Relationship Representation for 3D Hand Pose Estimation via Swin Transformer

被引：1

作者：

Wan, Xiangan ^{[1
]}

Ju, Jianping ^{[1
]}

Tang, Jianying ^{[1
]}

Lin, Mingyu ^{[1
]}

Rao, Ning ^{[1
]}

Chen, Deng ^{[2
]}

Liu, Tingting ^{[1
]}

Li, Jing ^{[1
]}

Bian, Fan ^{[1
]}

Xiong, Nicholas ^{[1
]}

机构：

[1] Hubei Business Coll, Sch Comp Sci & Technol, Wuhan 430079, Peoples R China

[2] Wuhan Inst Technol, Hubei Prov Key Lab Intelligent Robot, Wuhan 430079, Peoples R China

来源：

SENSORS | 2024年 / 24卷 / 21期

基金：

中国国家自然科学基金;

关键词：

depth image; 3D hand pose estimation; multi-perspective cues; Swin Transformer; deep learning; REGRESSION; NETWORK;

D O I：

10.3390/s24217029

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

The objective of 3D hand pose estimation (HPE) based on depth images is to accurately locate and predict keypoints of the hand. However, this task remains challenging because of the variations in hand appearance from different viewpoints and severe occlusions. To effectively address these challenges, this study introduces a novel approach, called the multi-perspective cue-aware joint relationship representation for 3D HPE via the Swin Transformer (MPCTrans, for short). This approach is designed to learn multi-perspective cues and essential information from hand depth images. To achieve this goal, three novel modules are proposed to utilize features from multiple virtual views of the hand, namely, the adaptive virtual multi-viewpoint (AVM), hierarchy feature estimation (HFE), and virtual viewpoint evaluation (VVE) modules. The AVM module adaptively adjusts the angles of the virtual viewpoint and learns the ideal virtual viewpoint to generate informative multiple virtual views. The HFE module estimates hand keypoints through hierarchical feature extraction. The VVE module evaluates virtual viewpoints by using chained high-level functions from the HFE module. Transformer is used as a backbone to extract the long-range semantic joint relationships in hand depth images. Extensive experiments demonstrate that the MPCTrans model achieves state-of-the-art performance on four challenging benchmark datasets.

引用

页数：17

共 20 条

[1] 3D hand pose and mesh estimation via a generic Topology-aware Transformer model
Yu, Shaoqi
Wang, Yintong
Chen, Lili
Zhang, Xiaolin
Li, Jiamao
FRONTIERS IN NEUROROBOTICS, 2024, 18
[2] HandDAGT: A Denoising Adaptive Graph Transformer for 3D Hand Pose Estimation
Cheng, Wencan
Kim, Eunji
Ko, Jong Hwan
COMPUTER VISION - ECCV 2024, PT LXXXVIII, 2025, 15146 : 35 - 52
[3] Orientation Cues-Aware Facial Relationship Representation for Head Pose Estimation via Transformer
Liu, Hai
Zhang, Cheng
Deng, Yongjian
Liu, Tingting
Zhang, Zhaoli
Li, You-Fu
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 6289 - 6302
[4] 3D Hand Pose Estimation via Graph-Based Reasoning
Song, Jae-Hun
Kang, Suk-Ju
IEEE ACCESS, 2021, 9 : 35824 - 35833
[5] 3D human pose estimation with multi-hypotheses gated transformer
Dong, Xiena
Zhang, Jian
Yu, Jun
Yu, Ting
MULTIMEDIA SYSTEMS, 2024, 30 (06)
[6] Refining Weights for Enhanced Object Similarity in Multi-perspective 6Dof Pose Estimation and 3D Object Detection
Kusumo, Budiarianto Suryo
Thomas, Ulrike
DEEP LEARNING THEORY AND APPLICATIONS, PT I, DELTA 2024, 2024, 2171 : 310 - 327
[7] MTMVC: Semi-supervised 3D hand pose estimation using multi-task and multi-view consistency
Xiang, Donghai
Xu, Wei
Zhang, Yuting
Peng, Bei
Wang, Guotai
Li, Kang
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 95
[8] 3D Capsule Hand Pose Estimation Network Based on Structural Relationship Information
Wu, Yiqi
Ma, Shichao
Zhang, Dejun
Sun, Jun
SYMMETRY-BASEL, 2020, 12 (10): : 1 - 14
[9] LPPM-Net: Local-aware point processing module based 3D hand pose estimation for point cloud
Yang, Jian
Ma, Xiaohong
Sun, Yi
Lin, Xiangbo
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2021, 90
[10] Joint multi-scale transformers and pose equivalence constraints for 3D human pose estimation
Wu, Yongpeng
Kong, Dehui
Gao, Junna
Li, Jinghua
Yin, Baocai
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 103

← 1 2 →