Investigating the Role of Image Retrieval for Visual Localization

被引:15
作者
Humenberger, Martin [1 ]
Cabon, Yohann [1 ]
Pion, Noe [1 ]
Weinzaepfel, Philippe [1 ]
Lee, Donghwan [2 ]
Guerin, Nicolas [1 ]
Sattler, Torsten [3 ]
Csurka, Gabriela [1 ]
机构
[1] NAVER LABS Europe, Meylan, France
[2] NAVER LABS, Seongnam Si, South Korea
[3] Czech Tech Univ, Czech Inst Informat Robot & Cybernet, Prague, Czech Republic
关键词
Visual localization; Image retrieval; Benchmark; Landmark retrieval; Place recognition; Camera pose estimation; QUERY EXPANSION; RECOGNITION; FEATURES; MODEL;
D O I
10.1007/s11263-022-01615-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual localization, i.e., camera pose estimation in a known scene, is a core component of technologies such as autonomous driving and augmented reality. State-of-the-art localization approaches often rely on image retrieval techniques for one of two purposes: (1) provide an approximate pose estimate or (2) determine which parts of the scene are potentially visible in a given query image. It is common practice to use state-of-the-art image retrieval algorithms for both of them. These algorithms are often trained for the goal of retrieving the same landmark under a large range of viewpoint changes which often differs from the requirements of visual localization. In order to investigate the consequences for visual localization, this paper focuses on understanding the role of image retrieval for multiple visual localization paradigms. First, we introduce a novel benchmark setup and compare state-of-the-art retrieval representations on multiple datasets using localization performance as metric. Second, we investigate several definitions of "ground truth" for image retrieval. Using these definitions as upper bounds for the visual localization paradigms, we show that there is still significant room for improvement. Third, using these tools and in-depth analysis, we show that retrieval performance on classical landmark retrieval or place recognition tasks correlates only for some but not all paradigms to localization performance. Finally, we analyze the effects of blur and dynamic scenes in the images. We conclude that there is a need for retrieval approaches specifically designed for localization paradigms. Our benchmark and evaluation protocols are available at https://github.com/naver/kapture-localization.
引用
收藏
页码:1811 / 1836
页数:26
相关论文
共 119 条
[1]  
[Anonymous], 2010, ACMMM
[2]  
[Anonymous], 2009, PROC INT C ON WORLD
[3]  
[Anonymous], 2004, ECCV WORKSHOPS
[4]  
Arandjelovic R, 2018, IEEE T PATTERN ANAL, V40, P1437, DOI [10.1109/TPAMI.2017.2711011, 10.1109/CVPR.2016.572]
[5]   DisLocation: Scalable Descriptor Distinctiveness for Location Recognition [J].
Arandjelovic, Relja ;
Zisserman, Andrew .
COMPUTER VISION - ACCV 2014, PT IV, 2015, 9006 :188-204
[6]   All about VLAD [J].
Arandjelovic, Relja ;
Zisserman, Andrew .
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :1578-1585
[7]  
Arandjelovic R, 2012, PROC CVPR IEEE, P2911, DOI 10.1109/CVPR.2012.6248018
[8]   Wide Area Localization on Mobile Phones [J].
Arth, Clemens ;
Wagner, Daniel ;
Klopschitz, Manfred ;
Irschara, Arnold ;
Schmalstieg, Dieter .
2009 8TH IEEE INTERNATIONAL SYMPOSIUM ON MIXED AND AUGMENTED REALITY - SCIENCE AND TECHNOLOGY, 2009, :73-82
[9]   Neural Codes for Image Retrieval [J].
Babenko, Artem ;
Slesarev, Anton ;
Chigorin, Alexandr ;
Lempitsky, Victor .
COMPUTER VISION - ECCV 2014, PT I, 2014, 8689 :584-599
[10]   RelocNet: Continuous Metric Learning Relocalisation Using Neural Nets [J].
Balntas, Vassileios ;
Li, Shuda ;
Prisacariu, Victor .
COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 :782-799