Visual Localization via Few-Shot Scene Region Classification

被引:16
作者
Dong, Siyan [1 ,3 ]
Wang, Shuzhe [2 ,3 ]
Zhuang, Yixin [4 ]
Kannala, Juho [2 ]
Pollefeys, Marc [3 ,5 ]
Chen, Baoquan [6 ]
机构
[1] Shandong Univ, Jinan, Shandong, Peoples R China
[2] Aalto Univ, Espoo, Finland
[3] Swiss Fed Inst Technol, Zurich, Switzerland
[4] Fuzhou Univ, Fuzhou, Peoples R China
[5] Microsoft, Zurich, Switzerland
[6] Peking Univ, Beijing, Peoples R China
来源
2022 INTERNATIONAL CONFERENCE ON 3D VISION, 3DV | 2022年
基金
芬兰科学院;
关键词
D O I
10.1109/3DV57658.2022.00051
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual (re)localization addresses the problem of estimating the 6-DoF (Degree of Freedom) camera pose of a query image captured in a known scene, which is a key building block of many computer vision and robotics applications. Recent advances in structure-based localization solve this problem by memorizing the mapping from image pixels to scene coordinates with neural networks to build 2D-3D correspondences for camera pose optimization. However, such memorization requires training by amounts of posed images in each scene, which is heavy and inefficient. On the contrary, few-shot images are usually sufficient to cover the main regions of a scene for a human operator to perform visual localization. In this paper, we propose a scene region classification approach to achieve fast and effective scene memorization with few-shot images. Our insight is leveraging a) pre-learned feature extractor, b) scene region classifier, and c) meta-learning strategy to accelerate training while mitigating overfitting. We evaluate our method on both indoor and outdoor benchmarks. The experiments validate the effectiveness of our method in the few-shot setting, and the training time is significantly reduced to only a few minutes.(1)
引用
收藏
页码:393 / 402
页数:10
相关论文
共 54 条
[21]   Low-shot Visual Recognition by Shrinking and Hallucinating Features [J].
Hariharan, Bharath ;
Girshick, Ross .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :3037-3046
[22]   VS-Net: Voting with Segmentation for Visual Localization [J].
Huang, Zhaoyang ;
Zhou, Han ;
Li, Yijin ;
Yang, Bangbang ;
Xu, Yan ;
Zhou, Xiaowei ;
Bao, Hujun ;
Zhang, Guofeng ;
Li, Hongsheng .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :6097-6107
[23]   An Efficient Algebraic Solution to the Perspective-Three-Point Problem [J].
Ke, Tong ;
Roumeliotis, Stergios I. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4618-4626
[24]   Geometric loss functions for camera pose regression with deep learning [J].
Kendall, Alex ;
Cipolla, Roberto .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6555-6564
[25]   PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization [J].
Kendall, Alex ;
Grimes, Matthew ;
Cipolla, Roberto .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2938-2946
[26]  
King DB, 2015, ACS SYM SER, V1214, P1, DOI 10.1021/bk-2015-1214.ch001
[27]  
Koch G., 2015, ICML DEEP LEARN WORK, V2
[28]  
Li XT, 2018, Arxiv, DOI arXiv:1802.03237
[29]  
Li Xiaotian, 2020, CVPR, P11983
[30]  
Nichol A, 2018, Arxiv, DOI arXiv:1803.02999