Geometry-Aware 3D Hand-Object Pose Estimation Under Occlusion via Hierarchical Feature Decoupling

被引：0

作者：

Cai, Yuting ^{[1
]}

Pan, Huimin ^{[1
]}

Yang, Jiayi ^{[1
]}

Liu, Yichen ^{[1
]}

Gao, Quanli ^{[1
]}

Wang, Xihan ^{[1
]}

机构：

[1] Xian Polytech Univ, Sch Comp Sci, Xian 710600, Peoples R China

来源：

ELECTRONICS | 2025年 / 14卷 / 05期

关键词：

hand-object interaction; 3D hand-object pose estimation; feature fusion;

D O I：

10.3390/electronics14051029

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Hand-object occlusion poses a significant challenge in 3D pose estimation. During hand-object interactions, parts of the hand or object are frequently occluded by the other, making it difficult to extract discriminative features for accurate pose estimation. Traditional methods typically extract features for both the hand and object from a single image using a shared backbone network. However, this approach often results in feature contamination, where hand and object features are mixed, especially in occluded regions. To address these issues, we propose a novel 3D hand-object pose estimation framework that explicitly tackles the problem of occlusion through two key innovations. While existing methods rely on a single backbone for feature extraction, our framework introduces a feature decoupling strategy that shares low-level features (using ResNet-50) to capture interaction contexts, while separating high-level features into two independent branches. This design ensures that hand-specific features and object-specific features are processed separately, reducing feature contamination and improving pose estimation accuracy under occlusion. Recognizing the correlation between the hand's occluded regions and the object's geometry, we introduce the Hand-Object Cross-Attention Transformer (HOCAT) module. Unlike traditional attention mechanisms that focus solely on feature correlations, the HOCAT leverages the geometric stability of the object as prior knowledge to guide the reconstruction of occluded hand regions. Specifically, the object features (key/value) provide contextual information to enhance the hand features (query), enabling the model to infer the positions of occluded hand joints based on the object's known structure. This approach significantly improves the model's ability to handle complex occlusion scenarios. The experimental results demonstrate that our method achieves significant improvements in hand-object pose estimation tasks on publicly available datasets such as HO3D V2 and Dex-YCB. On the HO3D V2 dataset, the PAMPJPE reaches 9.1 mm, the PAMPVPE is 9.0 mm, and the F-score reaches 95.8%.

引用

页数：15

共 41 条

[1] THOR-Net: End-to-end Graformer-based Realistic Two Hands and Object Reconstruction with Self-supervision [J].

Aboukhadra, Ahmed Tawfik ;

Malik, Jameel ;

Elhayek, Ahmed ;

Robertini, Nadia ;

Stricker, Didier .

2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, :1001-1010

[2] Pushing the Envelope for RGB-based Dense 3D Hand Pose Estimation via Neural Rendering [J].

Baek, Seungryul ;

Kim, Kwang In ;

Kim, Tae-Kyun .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :1067-1076

[3]

Ballan L., 2012, P COMPUTER VISIONECC

[4] 3D Hand Shape and Pose from Images in the Wild [J].

Boukhayma, Adnane ;

de Bem, Rodrigo ;

Torr, Philip H. S. .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :10835-10844

[5] I2UV-HandNet: Image-to-UV Prediction Network for Accurate and High-fidelity 3D Hand Mesh Modeling [J].

Chen, Ping ;

Chen, Yujin ;

Yang, Dong ;

Wu, Fangyin ;

Li, Qin ;

Xia, Qingpei ;

Tan, Yong .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :12909-12918

[6] MobRecon: Mobile-Friendly Hand Mesh Reconstruction from Monocular Image [J].

Chen, Xingyu ;

Liu, Yufeng ;

Dong, Yajiao ;

Zhang, Xiong ;

Ma, Chongyang ;

Xiong, Yanmin ;

Zhang, Yuan ;

Guo, Xiaoyan .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :20512-20522

[7] Model-based 3D Hand Reconstruction via Self-Supervised Learning [J].

Chen, Yujin ;

Tu, Zhigang ;

Kang, Di ;

Bao, Linchao ;

Zhang, Ying ;

Zhe, Xuefei ;

Chen, Ruizhi ;

Yuan, Junsong .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :10446-10455

[8] Joint Hand-Object 3D Reconstruction From a Single Image With Cross-Branch Feature Fusion [J].

Chen, Yujin ;

Tu, Zhigang ;

Kang, Di ;

Chen, Ruizhi ;

Bao, Linchao ;

Zhang, Zhengyou ;

Yuan, Junsong .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 :4008-4021

[9] Deformer: Dynamic Fusion Transformer for Robust Hand Pose Estimation [J].

Fu, Qichen ;

Liu, Xingyu ;

Xu, Ran ;

Niebles, Juan Carlos ;

Kitani, Kris M. .

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, :23543-23554

[10] Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation [J].

Hampali, Shreyas ;

Sarkar, Sayan Deb ;

Rad, Mahdi ;

Lepetit, Vincent .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :11080-11090

← 1 2 3 4 5 →