Multi-Modal Hand-Object Pose Estimation With Adaptive Fusion and Interaction Learning

被引：5

作者：

Hoang, Dinh-Cuong ^{[1
]}

Tan, Phan Xuan ^{[2
]}

Nguyen, Anh-Nhat ^{[1
]}

Vu, Duy-Quang ^{[1
]}

Vu, Van-Duc ^{[1
]}

Nguyen, Thu-Uyen ^{[1
]}

Hoang, Ngoc-Anh ^{[1
]}

Phan, Khanh-Toan ^{[1
]}

Tran, Duc-Thanh ^{[1
]}

Nguyen, Van-Thiep ^{[1
]}

Duong, Quang-Tri ^{[1
]}

Ho, Ngoc-Trung ^{[1
]}

Tran, Cong-Trinh ^{[1
]}

Duong, Van-Hiep ^{[1
]}

Ngo, Phuc-Quan ^{[1
]}

机构：

[1] FPT Univ, IT Dept, Hanoi 10000, Vietnam

[2] Shibaura Inst Technol, Coll Engn, Koto City, Tokyo 1358548, Japan

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Feature extraction; Three-dimensional displays; Shape; Pose estimation; Image color analysis; Task analysis; Solid modeling; robot vision systems; intelligent systems; deep learning; supervised learning; machine vision;

D O I：

10.1109/ACCESS.2024.3388870

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Hand-object configuration recovery is an important task in computer vision. The estimation of pose and shape for both hands and objects during interactive scenarios has various applications, particularly in augmented reality, virtual reality, or imitation-based robot learning. The problem is particularly challenging when the hand is interacting with objects in the environment, as this setting features both extreme occlusions and non-trivial shape deformations. While existing works treat the problem of estimating hand configurations (that is pose and shape parameters) in isolation from the recovery of parameters related to the object acted upon, we stipulate that the two problems are related and can be solved more accurately concurrently. We introduce an approach that jointly learns the features of hand and object from color and depth (RGB-D) images. Our approach fuses appearance and geometric features in an adaptive manner which allows us to accent or suppress features that are more meaningful for the upstream task of hand-object configuration recovery. We combine a deep Hough voting strategy that builds on our adaptive features with a graph convolutional network (GCN) to learn the interaction relationships between the hand and held object shapes during interaction. Experimental results demonstrate that our proposed approach consistently outperforms state-of-the-art methods on popular datasets.

引用

页码：54339 / 54351

页数：13

共 58 条

[1] Symmetry Aware Evaluation of 3D Object Detection and Pose Estimation in Scenes of Many Parts in Bulk [J].

Bregier, Romain ;

Devernay, Frederic ;

Leyrit, Laetitia ;

Crowley, James L. .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, :2209-2218

[2] DGECN: A Depth-Guided Edge Convolutional Network for End-to-End 6D Pose Estimation [J].

Cao, Tuo ;

Luo, Fei ;

Fu, Yanping ;

Zhang, Wenxiao ;

Zheng, Shengjie ;

Xiao, Chunxia .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :3773-3782

[3] Reconstructing Hand-Object Interactions in the Wild [J].

Cao, Zhe ;

Radosavovic, Ilija ;

Kanazawa, Angjoo ;

Malik, Jitendra .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :12397-12406

[4] DexYCB: A Benchmark for Capturing Hand Grasping of Objects [J].

Chao, Yu-Wei ;

Yang, Wei ;

Xiang, Yu ;

Molchanov, Pavlo ;

Handa, Ankur ;

Tremblay, Jonathan ;

Narang, Yashraj S. ;

Van Wyk, Karl ;

Iqbal, Umar ;

Birchfield, Stan ;

Kautz, Jan ;

Fox, Dieter .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :9040-9049

[5] Model-based 3D Hand Reconstruction via Self-Supervised Learning [J].

Chen, Yujin ;

Tu, Zhigang ;

Kang, Di ;

Bao, Linchao ;

Zhang, Ying ;

Zhe, Xuefei ;

Chen, Ruizhi ;

Yuan, Junsong .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :10446-10455

[6] Joint Hand-Object 3D Reconstruction From a Single Image With Cross-Branch Feature Fusion [J].

Chen, Yujin ;

Tu, Zhigang ;

Kang, Di ;

Chen, Ruizhi ;

Bao, Linchao ;

Zhang, Zhengyou ;

Yuan, Junsong .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 :4008-4021

[7] gSDF: Geometry-Driven Signed Distance Functions for 3D Hand-Object Reconstruction [J].

Chen, Zerui ;

Chen, Shizhe ;

Schmid, Cordelia ;

Laptev, Ivan .

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :12890-12900

[8] AlignSDF: Pose-Aligned Signed Distance Fields for Hand-Object Reconstruction [J].

Chen, Zerui ;

Hasson, Yana ;

Schmid, Cordelia ;

Laptev, Ivan .

COMPUTER VISION - ECCV 2022, PT I, 2022, 13661 :231-248

[9] Robust Hand Pose Estimation during the Interaction with an Unknown Object [J].

Choi, Chiho ;

Yoon, Sang Ho ;

Chen, Chin-Ning ;

Ramani, Karthik .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :3142-3151

[10]

Choi G., 16 EUR C COMPUT VIS

← 1 2 3 4 5 6 →