Bi-directional attention based RGB-D fusion for category-level object pose and shape estimation

被引:0
作者
Tang, Kaifeng [1 ,2 ,3 ]
Xu, Chi [1 ,2 ,3 ]
Chen, Ming [1 ,2 ,3 ]
机构
[1] China Univ Geosci, Sch Automat, Wuhan 430074, Peoples R China
[2] China Univ Geosci, Hubei Key Lab Adv Control & Intelligent Automat Co, Wuhan, Hubei, Peoples R China
[3] Minist Educ, Engn Res Ctr Intelligent Technol Geoexplorat, Wuhan, Hubei, Peoples R China
基金
中国国家自然科学基金;
关键词
Object pose estimation; Object shape estimation; Attention; RGB-D image; Robotic vision;
D O I
10.1007/s11042-023-17626-6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
RGB-D images contain color and geometric information which are complementary for object pose and shape estimation. Normally, dense-fusion scheme is used to fuse the features extracted from the RGB-D channels for pose estimation of instance-level objects. However, for category-level objects, the effectiveness of dense-fusion feature is unfortunately affected by the significant intra-class variations between color and geometry. To address this problem, we propose AttentionFusion, a bi-directional attention-based RGB-D fusion framework for category-level object pose and shape estimation. In this framework, the complex contextual relationship between the color and geometric features is effectively explored by bi-directional cross-attention mechanism on a global scale for feature fusion. Based on the fused feature, 6D pose of the category-level object instance is refined iteratively, and object shape is also estimated precisely. Experimental results show that, the proposed method can achieve state-of-the-art performance for object pose and shape estimation on REAL275 datasets.
引用
收藏
页码:53043 / 53063
页数:21
相关论文
共 64 条
[1]   Scan2CAD: Learning CAD Model Alignment in RGB-D Scans [J].
Avetisyan, Armen ;
Dahnert, Manuel ;
Dai, Angela ;
Savva, Manolis ;
Chang, Angel X. ;
Niessner, Matthias .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :2609-2618
[2]  
Brachmann E, 2014, LECT NOTES COMPUT SC, V8690, P536, DOI 10.1007/978-3-319-10605-2_35
[3]   CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification [J].
Chen, Chun-Fu ;
Fan, Quanfu ;
Panda, Rameswar .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :347-356
[4]   SGPA: Structure-Guided Prior Adaptation for Category-Level 6D Object Pose Estimation [J].
Chen, Kai ;
Dou, Qi .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :2753-2762
[5]  
Chen Wang, 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA), P10059, DOI 10.1109/ICRA40945.2020.9196679
[6]   FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism [J].
Chen, Wei ;
Jia, Xi ;
Chang, Hyung Jin ;
Duan, Jinming ;
Shen, Linlin ;
Leonardis, Ales .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :1581-1590
[7]   Multi-View 3D Object Detection Network for Autonomous Driving [J].
Chen, Xiaozhi ;
Ma, Huimin ;
Wan, Ji ;
Li, Bo ;
Xia, Tian .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6526-6534
[8]  
Dengsheng Chen, 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Proceedings, P11970, DOI 10.1109/CVPR42600.2020.01199
[9]   GPV-Pose: Category-level Object Pose Estimation via Geometry-guided Point-wise Voting [J].
Di, Yan ;
Zhang, Ruida ;
Lou, Zhiqiang ;
Manhardt, Fabian ;
Ji, Xiangyang ;
Navab, Nassir ;
Tombari, Federico .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :6771-6781
[10]  
Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929