3D hand pose and mesh estimation via a generic Topology-aware Transformer model

被引：1

作者：

Yu, Shaoqi ^{[1
,2
]}

Wang, Yintong ^{[1
,2
]}

Chen, Lili ^{[1
,2
]}

Zhang, Xiaolin ^{[1
,2
,3
]}

Li, Jiamao ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Shanghai Inst Microsyst & Informat Technol, Shanghai, Peoples R China

[2] Univ Chinese Acad Sci, Beijing, Peoples R China

[3] ShanghaiTech Univ, Shanghai, Peoples R China

来源：

FRONTIERS IN NEUROROBOTICS | 2024年 / 18卷

关键词：

3D hand pose estimation; HandGCNFormer; 3D hand mesh estimation; Graphformer; Transformer; GCN; REGRESSION;

D O I：

10.3389/fnbot.2024.1395652

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In Human-Robot Interaction (HRI), accurate 3D hand pose and mesh estimation hold critical importance. However, inferring reasonable and accurate poses in severe self-occlusion and high self-similarity remains an inherent challenge. In order to alleviate the ambiguity caused by invisible and similar joints during HRI, we propose a new Topology-aware Transformer network named HandGCNFormer with depth image as input, incorporating prior knowledge of hand kinematic topology into the network while modeling long-range contextual information. Specifically, we propose a novel Graphformer decoder with an additional Node-offset Graph Convolutional layer (NoffGConv). The Graphformer decoder optimizes the synergy between the Transformer and GCN, capturing long-range dependencies and local topological connections between joints. On top of that, we replace the standard MLP prediction head with a novel Topology-aware head to better exploit local topological constraints for more reasonable and accurate poses. Our method achieves state-of-the-art 3D hand pose estimation performance on four challenging datasets, including Hands2017, NYU, ICVL, and MSRA. To further demonstrate the effectiveness and scalability of our proposed Graphformer Decoder and Topology aware head, we extend our framework to HandGCNFormer-Mesh for the 3D hand mesh estimation task. The extended framework efficiently integrates a shape regressor with the original Graphformer Decoder and Topology aware head, producing Mano parameters. The results on the HO-3D dataset, which contains various and challenging occlusions, show that our HandGCNFormer-Mesh achieves competitive results compared to previous state-of-the-art 3D hand mesh estimation methods.

引用

页数：15

共 50 条

[41] 3D hand pose estimation using RGBD images and hybrid deep learning networks [J].

Mofarreh-Bonab, Mohammad ;

Seyedarabi, Hadi ;

Mozaffari Tazehkand, Behzad ;

Kasaei, Shohreh .

VISUAL COMPUTER, 2022, 38 (06) :2023-2032

[42] 3D Hand Pose Estimation Using Randomized Decision Forest with Segmentation Index Points [J].

Li, Peiyi ;

Ling, Haibin ;

Li, Xi ;

Liao, Chunyuan .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :819-827

[43] 3D hand pose estimation from a single RGB image by weighting the occlusion and classification [J].

Mahdikhanlou, Khadijeh ;

Ebrahimnezhad, Hossein .

PATTERN RECOGNITION, 2023, 136

[44] Regression-Based 3D Hand Pose Estimation for Human-Robot Interaction [J].

Bandi, Chaitanya ;

Thomas, Ulrike .

COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VISIGRAPP 2020, 2022, 1474 :507-529

[45] ASCS-Reinforcement Learning: A Cascaded Framework for Accurate 3D Hand Pose Estimation [J].

Chen, Mingqi ;

Shuang, Feng ;

Li, Shaodong ;

Liu, Xi .

PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, :335-342

[46] A Comprehensive Study on Deep Learning-Based 3D Hand Pose Estimation Methods [J].

Chatzis, Theocharis ;

Stergioulas, Andreas ;

Konstantinidis, Dimitrios ;

Dimitropoulos, Kosmas ;

Daras, Petros .

APPLIED SCIENCES-BASEL, 2020, 10 (19) :1-27

[47] Orientation Cues-Aware Facial Relationship Representation for Head Pose Estimation via Transformer [J].

Liu, Hai ;

Zhang, Cheng ;

Deng, Yongjian ;

Liu, Tingting ;

Zhang, Zhaoli ;

Li, You-Fu .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 :6289-6302

[48] Uncertainty-guided Diffusion Model for 3D Human Pose Estimation [J].

Liu, Zhihua ;

Wang, Yuru .

NEUROCOMPUTING, 2025, 641

[49] Towards Accurate Microstructure Estimation via 3D Hybrid Graph Transformer [J].

Yang, Junqing ;

Jiang, Haotian ;

Tassew, Tewodros ;

Sun, Peng ;

Ma, Jiquan ;

Xia, Yong ;

Yap, Pew-Thian ;

Chen, Geng .

MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT VIII, 2023, 14227 :25-34

[50] TGST: A transformer-graph framework for enhanced spatiotemporal modeling in 3D human pose estimation [J].

Yang, Aolei ;

Zhou, Yinghong ;

Lv, Chenchen ;

Yang, Banghua ;

Miao, Zhonghua ;

Fei, Minrui .

VISUAL COMPUTER, 2025, :9919-9932

← 1 2 3 4 5 →