Deep Fusion for Multi-Modal 6D Pose Estimation

被引:12
作者
Lin, Shifeng [1 ]
Wang, Zunran [2 ]
Zhang, Shenghao [2 ]
Ling, Yonggen [2 ]
Yang, Chenguang [3 ]
机构
[1] South China Univ Technol, Sch Automat Sci & Engn, Key Lab Autonomous Syst & Networked Control, Guangzhou 510640, Peoples R China
[2] Tencent, Robot X, Shenzhen 518057, Peoples R China
[3] Univ West England, Bristol Robot Lab, Bristol BS16 1QY, England
关键词
Pose estimation; Point cloud compression; Feature extraction; Decoding; Fuses; Solid modeling; Encoding; Deep learning for visual perception; RGB-D perception; pose estimation; OBJECT; NETWORK;
D O I
10.1109/TASE.2023.3327772
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
6D pose estimation with individual modality encounters difficulties due to the limitations of modalities, such as RGB information on textureless objects and depth on reflective objects. This can be improved by exploiting the complementarity between modalities. Most of the previous methods only consider the correspondence between point clouds and RGB images and directly extract the features of the corresponding two modalities for fusion, which ignore the information of the modality itself and are negatively affected by erroneous background information when introducing more features for fusion. To enhance the complementarities between multiple modalities, we propose a neighbor-based cross-modalities attention mechanism for multi-modal 6D pose estimation. Neighbors represent that the RGB features of multiple neighbor are applied for fusion, which expands the receptive field. The cross-modalities attention mechanism leverages the similarities between the different modal features to help modal feature fusion, which reduces the negative impact of incorrect background information. Moreover, we design some features between the rendered image and the original image to obtain the confidence of pose estimation results. Experimental results on LM, LM-O and YCB-V datasets demonstrate the effectiveness of our methods. Video is available at https://www.youtube.com/watch?v=ApNBcX6NEGs.
引用
收藏
页码:6540 / 6549
页数:10
相关论文
共 38 条
[1]   SilhoNet: An RGB Method for 6D Object Pose Estimation [J].
Billings, Gideon ;
Johnson-Roberson, Matthew .
IEEE ROBOTICS AND AUTOMATION LETTERS, 2019, 4 (04) :3727-3734
[2]   Data-Driven Grasp Synthesis-A Survey [J].
Bohg, Jeannette ;
Morales, Antonio ;
Asfour, Tamim ;
Kragic, Danica .
IEEE TRANSACTIONS ON ROBOTICS, 2014, 30 (02) :289-309
[3]  
Brachmann E, 2014, LECT NOTES COMPUT SC, V8690, P536, DOI 10.1007/978-3-319-10605-2_35
[4]  
Calli B, 2015, PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS (ICAR), P510, DOI 10.1109/ICAR.2015.7251504
[5]   G2L-Net: Global to Local Network for Real-time 6D Pose Estimation with Embedding Vector Features [J].
Chen, Wei ;
Jia, Xi ;
Chang, Hyung Jin ;
Duan, Jinming ;
Leonardis, Ales .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :4232-4241
[6]   Object Pose Estimation via Pruned Hough Forest With Combined Split Schemes for Robotic Grasp [J].
Dong, Huixu ;
Prasad, Dilip K. ;
Chen, I-Ming .
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2021, 18 (04) :1814-1821
[7]   Efficient Center Voting for Object Detection and 6D Pose Estimation in 3D Point Cloud [J].
Guo, Jianwei ;
Xing, Xuejun ;
Quan, Weize ;
Yan, Dong-Ming ;
Gu, Qingyi ;
Liu, Yang ;
Zhang, Xiaopeng .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 :5072-5084
[8]   FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation [J].
He, Yisheng ;
Huang, Haibin ;
Fan, Haoqiang ;
Chen, Qifeng ;
Sun, Jian .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :3002-3012
[9]   PVN3D: A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation [J].
He, Yisheng ;
Sun, Wei ;
Huang, Haibin ;
Liu, Jianran ;
Fan, Haoqiang ;
Sun, Jian .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11629-11638
[10]  
Hinterstoisser S, 2011, IEEE I CONF COMP VIS, P858, DOI 10.1109/ICCV.2011.6126326