Modern Backbone for Efficient Geo-localization

被引:2
作者
Zhu, Runzhe [1 ]
Yang, Mingze [2 ]
Zhang, Kaiyu [2 ]
Wu, Fei [2 ]
Yin, Ling [2 ]
Zhang, Yujin [2 ]
机构
[1] Zhejiang Univ, Shanghai Univ Engn Sci, Jiaxing Res Inst, Hangzhou, Zhejiang, Peoples R China
[2] Shanghai Univ Engn Sci, Shanghai, Peoples R China
来源
PROCEEDINGS OF THE 2023 WORKSHOP ON UAVS IN MULTIMEDIA: CAPTURING THE WORLD FROM A NEW PERSPECTIVE, UAVM 2023 | 2023年
关键词
Geo-localization; Cross-view Matching; Transformer; Modern Backbone; Knowledge Distillation;
D O I
10.1145/3607834.3616562
中图分类号
V [航空、航天];
学科分类号
08 ; 0825 ;
摘要
With the development of autonomous driving technology, vision geo-localization has obtained a consistently growing following. How to match correct image pair from different perspectives is the key technology. Existing geo-localization methods focus on designing complex attention mechanism based on traditional backbone, e.g., VGG, ResNet, but neglect the importance of backbone network. In this article, we propose a modern backbone based geo-localization method (MBEG). MBEG introduces the latest vision fundamental network EVA-02 as backbone, which has been fully trained in large datasets. In addition, the feature rotate encoding strategy is presented to eliminate the effects of image rotation. We also apply the knowledge distillation to squeeze network's parameters for actual application. Our work exhibited excellent performance on the University-1652 dataset, and our solution attained the top-1 ranking in the UAVs in Multimedia Challenge[25] for the University-160k dataset.
引用
收藏
页码:31 / 37
页数:7
相关论文
共 30 条
  • [1] Bao Hangbo, 2021, arXiv, DOI [10.48550/arXiv.2106.08254, DOI 10.48550/ARXIV.2106.08254]
  • [2] Bui DV, 2022, J ROBOT NETW ARTIF L, V9, P275
  • [3] Emerging Properties in Self-Supervised Vision Transformers
    Caron, Mathilde
    Touvron, Hugo
    Misra, Ishan
    Jegou, Herve
    Mairal, Julien
    Bojanowski, Piotr
    Joulin, Armand
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9630 - 9640
  • [4] A Transformer-Based Feature Segmentation and Region Alignment Method for UAV-View Geo-Localization
    Dai, Ming
    Hu, Jianhong
    Zhuang, Jiedong
    Zheng, Enhui
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (07) : 4376 - 4389
  • [5] Deuser F, 2023, Arxiv, DOI arXiv:2303.11851
  • [6] Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]
  • [7] EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
    Fang, Yuxin
    Wang, Wen
    Xie, Binhui
    Sun, Quan
    Wu, Ledell
    Wang, Xinggang
    Huang, Tiejun
    Wang, Xinlong
    Cao, Yue
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19358 - 19369
  • [8] Fang YX, 2023, Arxiv, DOI [arXiv:2303.11331, DOI 10.48550/ARXIV.2303.11331]
  • [9] Knowledge Distillation: A Survey
    Gou, Jianping
    Yu, Baosheng
    Maybank, Stephen J.
    Tao, Dacheng
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (06) : 1789 - 1819
  • [10] Masked Autoencoders Are Scalable Vision Learners
    He, Kaiming
    Chen, Xinlei
    Xie, Saining
    Li, Yanghao
    Dollar, Piotr
    Girshick, Ross
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15979 - 15988