Reference Pose Generation for Long-term Visual Localization via Learned Features and View Synthesis

被引：82

作者：

Zhang, Zichao ^{[1
]}

Sattler, Torsten ^{[2
]}

Scaramuzza, Davide ^{[1
]}

机构：

[1] Univ Zurich, Robot & Percept Grp, Zurich, Switzerland

[2] Czech Tech Univ, Czech Inst Informat Robot & Cybernet, Prague, Czech Republic

来源：

INTERNATIONAL JOURNAL OF COMPUTER VISION | 2021年 / 129卷 / 04期

关键词：

Visual localization; Benchmark construction; Learned local features; PLACE RECOGNITION;

D O I：

10.1007/s11263-020-01399-8

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Visual Localization is one of the key enabling technologies for autonomous driving and augmented reality. High quality datasets with accurate 6 Degree-of-Freedom (DoF) reference poses are the foundation for benchmarking and improving existing methods. Traditionally, reference poses have been obtained via Structure-from-Motion (SfM). However, SfM itself relies on local features which are prone to fail when images were taken under different conditions, e.g., day/night changes. At the same time, manually annotating feature correspondences is not scalable and potentially inaccurate. In this work, we propose a semi-automated approach to generate reference poses based on feature matching between renderings of a 3D model and real images via learned features. Given an initial pose estimate, our approach iteratively refines the pose based on feature matches against a rendering of the model from the current pose estimate. We significantly improve the nighttime reference poses of the popular Aachen Day-Night dataset, showing that state-of-the-art visual localization methods perform better (up to 47%) than predicted by the original reference poses. We extend the dataset with new nighttime test images, provide uncertainty estimates for our new reference poses, and introduce a new evaluation criterion. We will make our reference poses and our framework publicly available upon publication.

引用

页码：821 / 844

页数：24

共 151 条

[31] Choudhary S, 2012, LECT NOTES COMPUT SC, V7576, P130, DOI 10.1007/978-3-642-33715-4_10
[32] Optimal Randomized RANSAC
Chum, Ondrej
Matas, Jiri
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2008, 30 (08) : 1472 - 1482
[33] VidLoc: A Deep Spatio-Temporal Model for 6-DoF Video-Clip Relocalization
Clark, Ronald
Wang, Sen
Markham, Andrew
Trigoni, Niki
Wen, Hongkai
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 2652 - 2660
[34] Crandall D., 2011, 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), P3001, DOI 10.1109/CVPR.2011.5995626
[35] BundleFusion: Real-Time Globally Consistent 3D Reconstruction Using On-the-Fly Surface Reintegration
Dai, Angela
Niessner, Matthias
Zollhofer, Michael
Izadi, Shahram
Theobalt, Christian
[J]. ACM TRANSACTIONS ON GRAPHICS, 2017, 36 (03):
[36] Histograms of oriented gradients for human detection
Dalal, N
Triggs, B
[J]. 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 886 - 893
[37] MonoSLAM: Real-time single camera SLAM
Davison, Andrew J.
Reid, Ian D.
Molton, Nicholas D.
Stasse, Olivier
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2007, 29 (06) : 1052 - 1067
[38] SuperPoint: Self-Supervised Interest Point Detection and Description
DeTone, Daniel
Malisiewicz, Tomasz
Rabinovich, Andrew
[J]. PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 337 - 349
[39] CamNet: Coarse-to-Fine Retrieval for Camera Re-Localization
Ding, Mingyu
Wang, Zhe
Sun, Jiankai
Shi, Jianping
Luo, Ping
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 2871 - 2880
[40] Donoser M.S., 2014, CVPR

← 1 2 3 4 5 6 7 8 9 10 →