MV6D: Multi-View 6D Pose Estimation on RGB-D Frames Using a Deep Point-wise Voting Network

被引：12

作者：

Duffhauss, Fabian ^{[1
,2
]}

Demmler, Tobias ^{[3
]}

Neumann, Gerhard ^{[4
]}

机构：

[1] Bosch Ctr Artificial Intelligence, Renningen, Germany

[2] Univ Tubingen, Tubingen, Germany

[3] Robert Bosch GmbH, Stuttgart, Germany

[4] Karlsruhe Inst Technol, Karlsruhe, Germany

来源：

2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS) | 2022年

关键词：

OBJECT RECOGNITION; IMAGE; REGISTRATION;

D O I：

10.1109/IROS47612.2022.9982268

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Estimating 6D poses of objects is an essential computer vision task. However, most conventional approaches rely on camera data from a single perspective and therefore suffer from occlusions. We overcome this issue with our novel multi-view 6D pose estimation method called MV6D which accurately predicts the 6D poses of all objects in a cluttered scene based on RGB-D images from multiple perspectives. We base our approach on the PVN3D network that uses a single RGB-D image to predict keypoints of the target objects. We extend this approach by using a combined point cloud from multiple views and fusing the images from each view with a DenseFusion layer. In contrast to current multi-view pose detection networks such as CosyPose, our MV6D can learn the fusion of multiple perspectives in an end-to-end manner and does not require multiple prediction stages or subsequent fine tuning of the prediction. Furthermore, we present three novel photorealistic datasets of cluttered scenes with heavy occlusions. All of them contain RGB-D images from multiple perspectives and the ground truth for instance semantic segmentation and 6D pose estimation. MV6D significantly outperforms the state-of-the-art in multi-view 6D pose estimation even in cases where the camera poses are known inaccurately. Furthermore, we show that our approach is robust towards dynamic camera setups and that its accuracy increases incrementally with an increasing number of perspectives.

引用

页码：3568 / 3575

页数：8

共 64 条

[1]

[Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.01298

[2]

[Anonymous], 2019, CVPR, DOI DOI 10.1109/ICCV.2019.00937

[3]

[Anonymous], 2004, International Journal of Computer Vision, DOI DOI 10.1023/B:VISI.0000029664.99615.94

[4]

[Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.00469

[5]

[Anonymous], 2017, ICRA

[6] LEAST-SQUARES FITTING OF 2 3-D POINT SETS [J].

ARUN, KS ;

HUANG, TS ;

BLOSTEIN, SD .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1987, 9 (05) :699-700

[7] Speeded-Up Robust Features (SURF) [J].

Bay, Herbert ;

Ess, Andreas ;

Tuytelaars, Tinne ;

Van Gool, Luc .

COMPUTER VISION AND IMAGE UNDERSTANDING, 2008, 110 (03) :346-359

[8] A METHOD FOR REGISTRATION OF 3-D SHAPES [J].

BESL, PJ ;

MCKAY, ND .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1992, 14 (02) :239-256

[9]

Brachmann E, 2014, LECT NOTES COMPUT SC, V8690, P536, DOI 10.1007/978-3-319-10605-2_35

[10]

Calli B, 2015, PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS (ICAR), P510, DOI 10.1109/ICAR.2015.7251504

← 1 2 3 4 5 6 7 →