An Efficient Method based on Multi-view Semantic Alignment for Cross-view Geo-localization

被引:1
作者
Wang, Yifeng [1 ]
Xia, Yamei [1 ]
Lu, Tianbo [1 ]
Zhang, Xiaoyan [1 ]
Yao, Wenbin [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Comp Sci, Natl Pilot Software EngineeringSch, Beijing, Peoples R China
来源
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN | 2023年
基金
中国国家自然科学基金;
关键词
Geo-localization; Image Retrieval; Transformer; Semantic Alignment;
D O I
10.1109/IJCNN54540.2023.10191537
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-view geo-localization is to retrieve the most relevant images from different views. The biggest challenge is the visual differences between different views and the location shifts in practical applications. Existing methods usually extract fine-grained features of the retrieval target and match them by semantic alignment. The Transformer-based approach can focus on more contextual information than the CNN-based approach and also learn the geometric correspondence between two viewpoint images directly through the location encoding information. However, the existing methods need to fully utilize the information from different viewpoints, and the model needs to understand the context information sufficiently. To address these issues, we propose an efficient method to fully use image information from cross-views and feature fusion, divided into two branches: Aerial-View Local-Feature Cross-Fusion(ALCF) and Multi-View Global-feature Cross-Fusion(MGCF). By observing the characteristics of the aerial and street views, we perform a targeted fusion of global and local features from different viewpoints. In addition, we introduce a multi-view semantic alignment module, which can solve the problem that more noise information is introduced when the aerial view and street view images are semantically aligned. Experiments show that our proposed method achieves excellent performance in both the drone viewpoint target localization and drone navigation tasks on the University-1652 dataset.
引用
收藏
页数:8
相关论文
共 42 条
[1]  
Bansal M., 2011, P 19 ACM INT C MULT, P1125
[2]  
Bansal Mayank, 2011, P 19 ACM INT C MULT
[3]  
Bui DV, 2022, J ROBOT NETW ARTIF L, V9, P275
[4]  
Cai Sudong, 2019, P IEEE CVF INT C COM
[5]   Semantic Cross-View Matching [J].
Castaldo, Francesco ;
Zamir, Amir ;
Angst, Roland ;
Palmieri, Francesco ;
Savarese, Silvio .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOP (ICCVW), 2015, :1044-1052
[6]   Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Zhu, Yukun ;
Papandreou, George ;
Schroff, Florian ;
Adam, Hartwig .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851
[7]   A Transformer-Based Feature Segmentation and Region Alignment Method for UAV-View Geo-Localization [J].
Dai, Ming ;
Hu, Jianhong ;
Zhuang, Jiedong ;
Zheng, Enhui .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (07) :4376-4389
[8]   A Practical Cross-View Image Matching Method between UAV and Satellite for UAV-Based Geo-Localization [J].
Ding, Lirong ;
Zhou, Ji ;
Meng, Lingxuan ;
Long, Zhiyong .
REMOTE SENSING, 2021, 13 (01) :1-22
[9]  
Dosovitskiy A., 2020, PREPRINT
[10]  
Gascón J, 2021, AV INVESTIG EDUC MAT, P23