3D hand pose and mesh estimation via a generic Topology-aware Transformer model

被引：0

作者：

Yu, Shaoqi ^{[1
,2
]}

Wang, Yintong ^{[1
,2
]}

Chen, Lili ^{[1
,2
]}

Zhang, Xiaolin ^{[1
,2
,3
]}

Li, Jiamao ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Shanghai Inst Microsyst & Informat Technol, Shanghai, Peoples R China

[2] Univ Chinese Acad Sci, Beijing, Peoples R China

[3] ShanghaiTech Univ, Shanghai, Peoples R China

来源：

FRONTIERS IN NEUROROBOTICS | 2024年 / 18卷

关键词：

3D hand pose estimation; HandGCNFormer; 3D hand mesh estimation; Graphformer; Transformer; GCN; REGRESSION;

D O I：

10.3389/fnbot.2024.1395652

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In Human-Robot Interaction (HRI), accurate 3D hand pose and mesh estimation hold critical importance. However, inferring reasonable and accurate poses in severe self-occlusion and high self-similarity remains an inherent challenge. In order to alleviate the ambiguity caused by invisible and similar joints during HRI, we propose a new Topology-aware Transformer network named HandGCNFormer with depth image as input, incorporating prior knowledge of hand kinematic topology into the network while modeling long-range contextual information. Specifically, we propose a novel Graphformer decoder with an additional Node-offset Graph Convolutional layer (NoffGConv). The Graphformer decoder optimizes the synergy between the Transformer and GCN, capturing long-range dependencies and local topological connections between joints. On top of that, we replace the standard MLP prediction head with a novel Topology-aware head to better exploit local topological constraints for more reasonable and accurate poses. Our method achieves state-of-the-art 3D hand pose estimation performance on four challenging datasets, including Hands2017, NYU, ICVL, and MSRA. To further demonstrate the effectiveness and scalability of our proposed Graphformer Decoder and Topology aware head, we extend our framework to HandGCNFormer-Mesh for the 3D hand mesh estimation task. The extended framework efficiently integrates a shape regressor with the original Graphformer Decoder and Topology aware head, producing Mano parameters. The results on the HO-3D dataset, which contains various and challenging occlusions, show that our HandGCNFormer-Mesh achieves competitive results compared to previous state-of-the-art 3D hand mesh estimation methods.

引用

页数：15

共 50 条

[1] HandDAGT: A Denoising Adaptive Graph Transformer for 3D Hand Pose Estimation
Cheng, Wencan
Kim, Eunji
Ko, Jong Hwan
COMPUTER VISION - ECCV 2024, PT LXXXVIII, 2025, 15146 : 35 - 52
[2] CLIP-Hand3D: Exploiting 3D Hand Pose Estimation via Context-Aware Prompting
Guo, Shaoxiang
Cai, Qing
Qi, Lin
Dong, Junyu
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4896 - 4907
[3] GTIGNet: Global Topology Interaction Graphormer Network for 3D hand pose estimation
Liu, Yanjun
Fan, Wanshu
Wang, Cong
Wen, Shixi
Yang, Xin
Zhang, Qiang
Wei, Xiaopeng
Zhou, Dongsheng
NEURAL NETWORKS, 2025, 185
[4] Laplacian Mesh Transformer: Dual Attention and Topology Aware Network for 3D Mesh Classification and Segmentation
Li, Xiao-Juan
Yang, Jie
Zhang, Fang-Lue
COMPUTER VISION, ECCV 2022, PT XXIX, 2022, 13689 : 541 - 560
[5] MPCTrans: Multi-Perspective Cue-Aware Joint Relationship Representation for 3D Hand Pose Estimation via Swin Transformer
Wan, Xiangan
Ju, Jianping
Tang, Jianying
Lin, Mingyu
Rao, Ning
Chen, Deng
Liu, Tingting
Li, Jing
Bian, Fan
Xiong, Nicholas
SENSORS, 2024, 24 (21)
[6] 3D Hand Pose Estimation via Graph-Based Reasoning
Song, Jae-Hun
Kang, Suk-Ju
IEEE ACCESS, 2021, 9 : 35824 - 35833
[7] HMTNet: 3D Hand Pose Estimation From Single Depth Image Based on Hand Morphological Topology
Zhou, Weiguo
Jiang, Xin
Chen, Chen
Mei, Sijia
Liu, Yun-Hui
IEEE SENSORS JOURNAL, 2020, 20 (11) : 6004 - 6011
[8] Fast and Accurate 3D Hand Pose Estimation via Recurrent Neural Network for Capturing Hand Articulations
Yoo, Cheol-Hwan
Ji, Seowon
Shin, Yong-Goo
Kim, Seung-Wook
Ko, Sung-Jea
IEEE ACCESS, 2020, 8 : 114010 - 114019
[9] HOT-Net: Non-Autoregressive Transformer for 3D Hand-Object Pose Estimation
Huang, Lin
Tan, Jianchao
Meng, Jingjing
Liu, Ji
Yuan, Junsong
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 3136 - 3145
[10] DGFormer: Dynamic graph transformer for 3D human pose estimation
Chen, Zhangmeng
Dai, Ju
Bai, Junxuan
Pan, Junjun
PATTERN RECOGNITION, 2024, 152

← 1 2 3 4 5 →