Category-Level Pose Estimation and Iterative Refinement for Monocular RGB-D Image

被引：2

作者：

Bao, Yongtang ^{[1
]}

Qi, Yutong ^{[2
]}

Su, Chunjian ^{[1
]}

Geng, Yanbing ^{[3
]}

Li, Haojie ^{[1
]}

机构：

[1] Shandong Univ Sci & Technol, Coll Comp Sci & Engn, Qingdao, Peoples R China

[2] Univ Toronto, Dept Comp & Math Sci, Scarborough, ON, Canada

[3] North Univ China, Sch Data Sci & Technol, Taiyuan, Peoples R China

来源：

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS | 2024年 / 20卷 / 12期

基金：

中国国家自然科学基金;

关键词：

Deep learning; category-level pose estimation; scene understanding; transformer; TRANSFORMER;

D O I：

10.1145/3695877

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Category-level pose estimation is proposed to predict the 6D pose of objects under a specific category and has wide applications in fields such as robotics, virtual reality, and autonomous driving. With the development of VR/AR technology, pose estimation has gradually become a research hotspot in 3D scene understanding. However, most methods fail to fully utilize geometric and color information to solve intra-class shape variations, which leads to inaccurate prediction results. To solve the above problems, we propose a novel pose estimation and iterative refinement network, use an attention mechanism to fuse multi-modal information to obtain color features after a coordinate transformation, and design iterative modules to ensure the accuracy of object geometric features. Specifically, we use an encoder-decoder architecture to implicitly generate a coarse-grained initial pose and refine it through an iterative refinement module. In addition, due to the differences between rotation and position estimation, we design a multi-head pose decoder that utilizes the local geometry and global features. Finally, we design a transformer-based coordinate transformation attention module to extract pose-sensitive features from RGB images and supervise color information by correlating point cloud features in different coordinate systems. We train and test our network on the synthetic dataset CAMERA25 and the real dataset REAL275. Experimental results show that our method achieves state-of-the-art performance on multiple evaluation metrics.

引用

页数：20

共 83 条

[1] MonoScene: Monocular 3D Semantic Scene Completion [J].

Anh-Quan Cao ;

de Charette, Raoul .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :3981-3991

[2] PointNetLK: Robust & Efficient Point Cloud Registration using PointNet [J].

Aoki, Yasuhiro ;

Goforth, Hunter ;

Srivatsan, Rangaprasad Arun ;

Lucey, Simon .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :7156-7165

[3] Sparse Iterative Closest Point [J].

Bouaziz, Sofien ;

Tagliasacchi, Andrea ;

Pauly, Mark .

COMPUTER GRAPHICS FORUM, 2013, 32 (05) :113-123

[4] SDFEst: Categorical Pose and Shape Estimation of Objects From RGB-D Using Signed Distance Fields [J].

Bruns, Leonard ;

Jensfelt, Patric .

IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (04) :9597-9604

[5] SceneRF: Self-Supervised Monocular 3D Scene Reconstruction with Radiance Fields [J].

Cao, Anh-Quan ;

de Charette, Raoul .

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, :9353-9364

[6] Mobile Augmented Reality: User Interfaces, Frameworks, and Intelligence [J].

Cao, Jacky ;

Lam, Kit-Yung ;

Lee, Lik-Hang ;

Liu, Xiaoli ;

Hui, Pan ;

Su, Xiang .

ACM COMPUTING SURVEYS, 2023, 55 (09)

[7] CRT-6D: Fast 6D Object Pose Estimation with Cascaded Refinement Transformers [J].

Castro, Pedro ;

Kim, Tae-Kyun .

2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, :5735-5744

[8] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation [J].

Chen, Hansheng ;

Wang, Pichao ;

Wang, Fan ;

Tian, Wei ;

Xiong, Lu ;

Li, Hao .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :2771-2780

[9] SGPA: Structure-Guided Prior Adaptation for Category-Level 6D Object Pose Estimation [J].

Chen, Kai ;

Dou, Qi .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :2753-2762

[10]

Chen Wei, 2023, SPML '23: Proceedings of the 2023 6th International Conference on Signal Processing and Machine Learning (SPML), P201, DOI 10.1145/3614008.3614040

← 1 2 3 4 5 6 7 8 9 →