Multi-Modal Hand-Object Pose Estimation With Adaptive Fusion and Interaction Learning

被引:3
作者
Hoang, Dinh-Cuong [1 ]
Tan, Phan Xuan [2 ]
Nguyen, Anh-Nhat [1 ]
Vu, Duy-Quang [1 ]
Vu, Van-Duc [1 ]
Nguyen, Thu-Uyen [1 ]
Hoang, Ngoc-Anh [1 ]
Phan, Khanh-Toan [1 ]
Tran, Duc-Thanh [1 ]
Nguyen, Van-Thiep [1 ]
Duong, Quang-Tri [1 ]
Ho, Ngoc-Trung [1 ]
Tran, Cong-Trinh [1 ]
Duong, Van-Hiep [1 ]
Ngo, Phuc-Quan [1 ]
机构
[1] FPT Univ, IT Dept, Hanoi 10000, Vietnam
[2] Shibaura Inst Technol, Coll Engn, Koto City, Tokyo 1358548, Japan
关键词
Feature extraction; Three-dimensional displays; Shape; Pose estimation; Image color analysis; Task analysis; Solid modeling; robot vision systems; intelligent systems; deep learning; supervised learning; machine vision;
D O I
10.1109/ACCESS.2024.3388870
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Hand-object configuration recovery is an important task in computer vision. The estimation of pose and shape for both hands and objects during interactive scenarios has various applications, particularly in augmented reality, virtual reality, or imitation-based robot learning. The problem is particularly challenging when the hand is interacting with objects in the environment, as this setting features both extreme occlusions and non-trivial shape deformations. While existing works treat the problem of estimating hand configurations (that is pose and shape parameters) in isolation from the recovery of parameters related to the object acted upon, we stipulate that the two problems are related and can be solved more accurately concurrently. We introduce an approach that jointly learns the features of hand and object from color and depth (RGB-D) images. Our approach fuses appearance and geometric features in an adaptive manner which allows us to accent or suppress features that are more meaningful for the upstream task of hand-object configuration recovery. We combine a deep Hough voting strategy that builds on our adaptive features with a graph convolutional network (GCN) to learn the interaction relationships between the hand and held object shapes during interaction. Experimental results demonstrate that our proposed approach consistently outperforms state-of-the-art methods on popular datasets.
引用
收藏
页码:54339 / 54351
页数:13
相关论文
共 58 条
[11]   SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation [J].
Di, Yan ;
Manhardt, Fabian ;
Wang, Gu ;
Ji, Xiangyang ;
Navab, Nassir ;
Tombari, Federico .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :12376-12385
[12]   Context-Aware Grasp Generation in Cluttered Scenes [J].
Dinh-Cuong Hoang ;
Stork, Johannes A. ;
Stoyanov, Todor .
2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022, :1492-1498
[13]   First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations [J].
Garcia-Hernando, Guillermo ;
Yuan, Shanxin ;
Baek, Seungryul ;
Kim, Tae-Kyun .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :409-419
[14]   3D Hand-Object Pose Estimation from Depth with Convolutional Neural Networks [J].
Goudie, Duncan ;
Galata, Aphrodite .
2017 12TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2017), 2017, :406-413
[15]   Learning assistive strategies for exoskeleton robots from user-robot physical interaction [J].
Hamaya, Masashi ;
Matsubara, Takamitsu ;
Noda, Tomoyuki ;
Teramae, Tatsuya ;
Morimoto, Jun .
PATTERN RECOGNITION LETTERS, 2017, 99 :67-76
[16]   Towards Unconstrained Joint Hand-Object Reconstruction From RGB Videos [J].
Hasson, Yana ;
Varol, Gul ;
Schmid, Cordelia ;
Laptev, Ivan .
2021 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2021), 2021, :659-668
[17]   Leveraging Photometric Consistency over Time for Sparsely Supervised Hand-Object Reconstruction [J].
Hasson, Yana ;
Tekin, Bugra ;
Bogo, Federica ;
Laptev, Ivan ;
Pollefeys, Marc ;
Schmid, Cordelia .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :568-577
[18]   Learning joint reconstruction of hands and manipulated objects [J].
Hasson, Yana ;
Varol, Gul ;
Tzionas, Dimitrios ;
Kalevatykh, Igor ;
Black, Michael J. ;
Laptev, Ivan ;
Schmid, Cordelia .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :11799-11808
[19]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[20]   FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation [J].
He, Yisheng ;
Huang, Haibin ;
Fan, Haoqiang ;
Chen, Qifeng ;
Sun, Jian .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :3002-3012