Visual Localization using Imperfect 3D Models from the Internet

被引:11
|
作者
Panek, Vojtech [1 ,2 ]
Kukelova, Zuzana [3 ]
Sattler, Torsten [2 ]
机构
[1] Czech Tech Univ, Fac Elect Engn, Prague, Czech Republic
[2] Czech Tech Univ, Czech Inst Informat Robot & Cybernet, Prague, Czech Republic
[3] Czech Tech Univ, Visual Recognit Grp, Fac Elect Engn, Prague, Czech Republic
来源
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年
关键词
POSE ESTIMATION; IMAGE; ALIGNMENT; OBJECTS; WORLD;
D O I
10.1109/CVPR52729.2023.01266
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual localization is a core component in many applications, including augmented reality (AR). Localization algorithms compute the camera pose of a query image w.r.t. a scene representation, which is typically built from images. This often requires capturing and storing large amounts of data, followed by running Structure-from-Motion (SfM) algorithms. An interesting, and underexplored, source of data for building scene representations are 3D models that are readily available on the Internet, e.g., hand-drawn CAD models, 3D models generated from building footprints, or from aerial images. These models allow to perform visual localization right away without the time-consuming scene capturing and model building steps. Yet, it also comes with challenges as the available 3D models are often imperfect reflections of reality. E.g., the models might only have generic or no textures at all, might only provide a simple approximation of the scene geometry, or might be stretched. This paper studies how the imperfections of these models affect localization accuracy. We create a new benchmark for this task and provide a detailed experimental evaluation based on multiple 3D models per scene. We show that 3D models from the Internet show promise as an easy-to-obtain scene representation. At the same time, there is significant room for improvement for visual localization pipelines. To foster research on this interesting and challenging task, we release our benchmark at v-pnk.github.io/cadloc.
引用
收藏
页码:13175 / 13186
页数:12
相关论文
共 50 条
  • [41] Object recognition and localization from 3D point clouds by maximum-likelihood estimation
    Dantanarayana, Harshana G.
    Huntley, Jonathan M.
    ROYAL SOCIETY OPEN SCIENCE, 2017, 4 (08):
  • [42] Memory and visual search in naturalistic 2D and 3D environments
    Li, Chia-Ling
    Pilar Aivar, M.
    Kit, Dmitry M.
    Tong, Matthew H.
    Hayhoe, Mary M.
    JOURNAL OF VISION, 2016, 16 (08):
  • [43] Animation Design Based on 3D Visual Communication Technology
    Shan, Feng
    Wang, Youya
    SCIENTIFIC PROGRAMMING, 2022, 2022
  • [44] Survey on computational 3D visual optical art design
    Wu, Kang
    Fu, Xiao-Ming
    Chen, Renjie
    Liu, Ligang
    VISUAL COMPUTING FOR INDUSTRY BIOMEDICINE AND ART, 2022, 5 (01)
  • [45] Aligning 3D polygonal models with improved PCA
    Liu Wei
    He Yuanjun
    INTERACTIVE TECHNOLOGIES AND SOCIOTECHNICAL SYSTEMS, 2006, 4270 : 263 - 268
  • [46] Visual Servoing Control Based on Reconstructed 3D Features
    Xu, Degang
    Zhou, Lei
    Lei, Yifan
    Shen, Tiantian
    INTELLIGENT ROBOTICS AND APPLICATIONS, ICIRA 2019, PT V, 2019, 11744 : 553 - 564
  • [47] 3D Vehicle Pose Estimation from an Image Using Geometry
    Stojanovic, Nikola
    Pantic, Vasilije
    Damjanovic, Vladan
    Vukmirovic, Srdan
    2022 21ST INTERNATIONAL SYMPOSIUM INFOTEH-JAHORINA (INFOTEH), 2022,
  • [48] View subspaces for indexing and retrieval of 3D models
    Dutagaci, Helin
    Godil, Afzal
    Sankur, Bulent
    Yemez, Yucel
    THREE-DIMENSIONAL IMAGE PROCESSING (3DIP) AND APPLICATIONS, 2010, 7526
  • [49] Subspace methods for retrieval of general 3D models
    Dutagaci, Helin
    Sankur, Buelent
    Yemez, Yuecel
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2010, 114 (08) : 865 - 886
  • [50] 3D SHAPE RECONSTRUCTION ENDOSCOPE USING SHAPE FROM FOCUS
    Takeshita, T.
    Nakajima, Y.
    Kim, M. K.
    Onogi, S.
    Mitsuishi, M.
    Matsumoto, Y.
    VISAPP 2009: PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 1, 2009, : 411 - +