GeoViewMatch: A Multi-Scale Feature-Matching Network for Cross-View Geo-Localization Using Swin-Transformer and Contrastive Learning

被引：0

作者：

Zhang, Wenhui ^{[1
]}

Zhong, Zhinong ^{[1
]}

Chen, Hao ^{[1
,2
]}

Jing, Ning ^{[1
,2
]}

机构：

[1] Natl Univ Def Technol, Coll Elect Sci & Technol, Changsha 410073, Peoples R China

[2] Minist Nat Resources, Key Lab Nat Resources Monitoring & Supervis Southe, Changsha 410073, Peoples R China

来源：

REMOTE SENSING | 2024年 / 16卷 / 04期

关键词：

cross-view geo-localization; contrastive learning; multi-scale feature extraction; remote sensing;

D O I：

10.3390/rs16040678

中图分类号：

X [环境科学、安全科学];

学科分类号：

08 ; 0830 ;

摘要：

Cross-view geo-localization aims to locate street-view images by matching them with a collection of GPS-tagged remote sensing (RS) images. Due to the significant viewpoint and appearance differences between street-view images and RS images, this task is highly challenging. While deep learning-based methods have shown their dominance in the cross-view geo-localization task, existing models have difficulties in extracting comprehensive meaningful features from both domains of images. This limitation results in not establishing accurate and robust dependencies between street-view images and the corresponding RS images. To address the aforementioned issues, this paper proposes a novel and lightweight neural network for cross-view geo-localization. Firstly, in order to capture more diverse information, we propose a module for extracting multi-scale features from images. Secondly, we introduce contrastive learning and design a contrastive loss to further enhance the robustness in extracting and aligning meaningful multi-scale features. Finally, we conduct comprehensive experiments on two open benchmarks. The experimental results have demonstrated the superiority of the proposed method over the state-of-the-art methods.

引用

页数：19

共 50 条

[1] Building Rome in a Day
Agarwal, Sameer
Furukawa, Yasutaka
Snavely, Noah
Simon, Ian
Curless, Brian
Seitz, Steven M.
Szeliski, Richard
[J]. COMMUNICATIONS OF THE ACM, 2011, 54 (10) : 105 - 112
[2] Baatz G, 2012, LECT NOTES COMPUT SC, V7573, P517, DOI 10.1007/978-3-642-33709-3_37
[3] SemiRoadExNet: A semi-supervised network for road extraction from remote sensing imagery via adversarial learning
Chen, Hao
Li, Zhenghong
Wu, Jiangjiang
Xiong, Wei
Du, Chun
[J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2023, 198 : 169 - 183
[4] Chen T., 2020, Advances in neural information processing systems, V33, P22243, DOI DOI 10.48550/ARXIV.2006.10029
[5] Chen Ting, 2019, INT C MACHINE LEARN
[6] An Empirical Study of Training Self-Supervised Vision Transformers
Chen, Xinlei
Xie, Saining
He, Kaiming
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9620 - 9629
[7] A Transformer-Based Feature Segmentation and Region Alignment Method for UAV-View Geo-Localization
Dai, Ming
Hu, Jianhong
Zhuang, Jiedong
Zheng, Enhui
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (07) : 4376 - 4389
[8] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[9] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
[10] Momentum Contrast for Unsupervised Visual Representation Learning
He, Kaiming
Fan, Haoqi
Wu, Yuxin
Xie, Saining
Girshick, Ross
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 9726 - 9735

← 1 2 3 4 5 →