Modern Backbone for Efficient Geo-localization

被引：2

作者：

Zhu, Runzhe ^{[1
]}

Yang, Mingze ^{[2
]}

Zhang, Kaiyu ^{[2
]}

Wu, Fei ^{[2
]}

Yin, Ling ^{[2
]}

Zhang, Yujin ^{[2
]}

机构：

[1] Zhejiang Univ, Shanghai Univ Engn Sci, Jiaxing Res Inst, Hangzhou, Zhejiang, Peoples R China

[2] Shanghai Univ Engn Sci, Shanghai, Peoples R China

来源：

PROCEEDINGS OF THE 2023 WORKSHOP ON UAVS IN MULTIMEDIA: CAPTURING THE WORLD FROM A NEW PERSPECTIVE, UAVM 2023 | 2023年

关键词：

Geo-localization; Cross-view Matching; Transformer; Modern Backbone; Knowledge Distillation;

D O I：

10.1145/3607834.3616562

中图分类号：

V [航空、航天];

学科分类号：

08 ; 0825 ;

摘要：

With the development of autonomous driving technology, vision geo-localization has obtained a consistently growing following. How to match correct image pair from different perspectives is the key technology. Existing geo-localization methods focus on designing complex attention mechanism based on traditional backbone, e.g., VGG, ResNet, but neglect the importance of backbone network. In this article, we propose a modern backbone based geo-localization method (MBEG). MBEG introduces the latest vision fundamental network EVA-02 as backbone, which has been fully trained in large datasets. In addition, the feature rotate encoding strategy is presented to eliminate the effects of image rotation. We also apply the knowledge distillation to squeeze network's parameters for actual application. Our work exhibited excellent performance on the University-1652 dataset, and our solution attained the top-1 ranking in the UAVs in Multimedia Challenge[25] for the University-160k dataset.

引用

页码：31 / 37

页数：7

共 30 条

[1] Bao Hangbo, 2021, arXiv, DOI [10.48550/arXiv.2106.08254, DOI 10.48550/ARXIV.2106.08254]
[2] Bui DV, 2022, J ROBOT NETW ARTIF L, V9, P275
[3] Emerging Properties in Self-Supervised Vision Transformers
Caron, Mathilde
Touvron, Hugo
Misra, Ishan
Jegou, Herve
Mairal, Julien
Bojanowski, Piotr
Joulin, Armand
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9630 - 9640
[4] A Transformer-Based Feature Segmentation and Region Alignment Method for UAV-View Geo-Localization
Dai, Ming
Hu, Jianhong
Zhuang, Jiedong
Zheng, Enhui
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (07) : 4376 - 4389
[5] Deuser F, 2023, Arxiv, DOI arXiv:2303.11851
[6] Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]
[7] EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
Fang, Yuxin
Wang, Wen
Xie, Binhui
Sun, Quan
Wu, Ledell
Wang, Xinggang
Huang, Tiejun
Wang, Xinlong
Cao, Yue
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19358 - 19369
[8] Fang YX, 2023, Arxiv, DOI [arXiv:2303.11331, DOI 10.48550/ARXIV.2303.11331]
[9] Knowledge Distillation: A Survey
Gou, Jianping
Yu, Baosheng
Maybank, Stephen J.
Tao, Dacheng
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (06) : 1789 - 1819
[10] Masked Autoencoders Are Scalable Vision Learners
He, Kaiming
Chen, Xinlei
Xie, Saining
Li, Yanghao
Dollar, Piotr
Girshick, Ross
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15979 - 15988

← 1 2 3 →