Visual Localization via Few-Shot Scene Region Classification

被引：16

作者：

Dong, Siyan ^{[1
,3
]}

Wang, Shuzhe ^{[2
,3
]}

Zhuang, Yixin ^{[4
]}

Kannala, Juho ^{[2
]}

Pollefeys, Marc ^{[3
,5
]}

Chen, Baoquan ^{[6
]}

机构：

[1] Shandong Univ, Jinan, Shandong, Peoples R China

[2] Aalto Univ, Espoo, Finland

[3] Swiss Fed Inst Technol, Zurich, Switzerland

[4] Fuzhou Univ, Fuzhou, Peoples R China

[5] Microsoft, Zurich, Switzerland

[6] Peking Univ, Beijing, Peoples R China

来源：

2022 INTERNATIONAL CONFERENCE ON 3D VISION, 3DV | 2022年

基金：

芬兰科学院;

关键词：

D O I：

10.1109/3DV57658.2022.00051

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Visual (re)localization addresses the problem of estimating the 6-DoF (Degree of Freedom) camera pose of a query image captured in a known scene, which is a key building block of many computer vision and robotics applications. Recent advances in structure-based localization solve this problem by memorizing the mapping from image pixels to scene coordinates with neural networks to build 2D-3D correspondences for camera pose optimization. However, such memorization requires training by amounts of posed images in each scene, which is heavy and inefficient. On the contrary, few-shot images are usually sufficient to cover the main regions of a scene for a human operator to perform visual localization. In this paper, we propose a scene region classification approach to achieve fast and effective scene memorization with few-shot images. Our insight is leveraging a) pre-learned feature extractor, b) scene region classifier, and c) meta-learning strategy to accelerate training while mitigating overfitting. We evaluate our method on both indoor and outdoor benchmarks. The experiments validate the effectiveness of our method in the few-shot setting, and the training time is significantly reduced to only a few minutes.(1)

引用

页码：393 / 402

页数：10

共 54 条

[21] Low-shot Visual Recognition by Shrinking and Hallucinating Features [J].

Hariharan, Bharath ;

Girshick, Ross .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :3037-3046

[22] VS-Net: Voting with Segmentation for Visual Localization [J].

Huang, Zhaoyang ;

Zhou, Han ;

Li, Yijin ;

Yang, Bangbang ;

Xu, Yan ;

Zhou, Xiaowei ;

Bao, Hujun ;

Zhang, Guofeng ;

Li, Hongsheng .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :6097-6107

[23] An Efficient Algebraic Solution to the Perspective-Three-Point Problem [J].

Ke, Tong ;

Roumeliotis, Stergios I. .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4618-4626

[24] Geometric loss functions for camera pose regression with deep learning [J].

Kendall, Alex ;

Cipolla, Roberto .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6555-6564

[25] PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization [J].

Kendall, Alex ;

Grimes, Matthew ;

Cipolla, Roberto .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2938-2946

[26]

King DB, 2015, ACS SYM SER, V1214, P1, DOI 10.1021/bk-2015-1214.ch001

[27]

Koch G., 2015, ICML DEEP LEARN WORK, V2

[28]

Li XT, 2018, Arxiv, DOI arXiv:1802.03237

[29]

Li Xiaotian, 2020, CVPR, P11983

[30]

Nichol A, 2018, Arxiv, DOI arXiv:1803.02999

← 1 2 3 4 5 6 →