GeoViewMatch: A Multi-Scale Feature-Matching Network for Cross-View Geo-Localization Using Swin-Transformer and Contrastive Learning

被引:0
作者
Zhang, Wenhui [1 ]
Zhong, Zhinong [1 ]
Chen, Hao [1 ,2 ]
Jing, Ning [1 ,2 ]
机构
[1] Natl Univ Def Technol, Coll Elect Sci & Technol, Changsha 410073, Peoples R China
[2] Minist Nat Resources, Key Lab Nat Resources Monitoring & Supervis Southe, Changsha 410073, Peoples R China
关键词
cross-view geo-localization; contrastive learning; multi-scale feature extraction; remote sensing;
D O I
10.3390/rs16040678
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Cross-view geo-localization aims to locate street-view images by matching them with a collection of GPS-tagged remote sensing (RS) images. Due to the significant viewpoint and appearance differences between street-view images and RS images, this task is highly challenging. While deep learning-based methods have shown their dominance in the cross-view geo-localization task, existing models have difficulties in extracting comprehensive meaningful features from both domains of images. This limitation results in not establishing accurate and robust dependencies between street-view images and the corresponding RS images. To address the aforementioned issues, this paper proposes a novel and lightweight neural network for cross-view geo-localization. Firstly, in order to capture more diverse information, we propose a module for extracting multi-scale features from images. Secondly, we introduce contrastive learning and design a contrastive loss to further enhance the robustness in extracting and aligning meaningful multi-scale features. Finally, we conduct comprehensive experiments on two open benchmarks. The experimental results have demonstrated the superiority of the proposed method over the state-of-the-art methods.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Building Rome in a Day
    Agarwal, Sameer
    Furukawa, Yasutaka
    Snavely, Noah
    Simon, Ian
    Curless, Brian
    Seitz, Steven M.
    Szeliski, Richard
    [J]. COMMUNICATIONS OF THE ACM, 2011, 54 (10) : 105 - 112
  • [2] Baatz G, 2012, LECT NOTES COMPUT SC, V7573, P517, DOI 10.1007/978-3-642-33709-3_37
  • [3] SemiRoadExNet: A semi-supervised network for road extraction from remote sensing imagery via adversarial learning
    Chen, Hao
    Li, Zhenghong
    Wu, Jiangjiang
    Xiong, Wei
    Du, Chun
    [J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2023, 198 : 169 - 183
  • [4] Chen T., 2020, Advances in neural information processing systems, V33, P22243, DOI DOI 10.48550/ARXIV.2006.10029
  • [5] Chen Ting, 2019, INT C MACHINE LEARN
  • [6] An Empirical Study of Training Self-Supervised Vision Transformers
    Chen, Xinlei
    Xie, Saining
    He, Kaiming
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9620 - 9629
  • [7] A Transformer-Based Feature Segmentation and Region Alignment Method for UAV-View Geo-Localization
    Dai, Ming
    Hu, Jianhong
    Zhuang, Jiedong
    Zheng, Enhui
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (07) : 4376 - 4389
  • [8] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
  • [9] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
  • [10] Momentum Contrast for Unsupervised Visual Representation Learning
    He, Kaiming
    Fan, Haoqi
    Wu, Yuxin
    Xie, Saining
    Girshick, Ross
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 9726 - 9735