A Novel Geo-Localization Method for UAV and Satellite Images Using Cross-View Consistent Attention

被引:9
作者
Cui, Zhuofan [1 ]
Zhou, Pengwei [1 ]
Wang, Xiaolong [1 ]
Zhang, Zilun [2 ]
Li, Yingxuan [1 ]
Li, Hongbo [3 ]
Zhang, Yu [1 ]
机构
[1] Zhejiang Univ, Coll Control Sci & Engn, State Key Lab Ind Control Technol, Hangzhou 310012, Peoples R China
[2] Zhejiang Univ, Coll Comp Sci, Hangzhou 310012, Peoples R China
[3] Beijing Geekplus Technol Co Ltd, 7-F,Block D,Beijing Cultural & Creat Bldg,30 Beiyu, Beijing 100107, Peoples R China
关键词
geo-localization; UAV; satellite; transformer; cross-view;
D O I
10.3390/rs15194667
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Geo-localization has been widely applied as an important technique to get the longitude and latitude for unmanned aerial vehicle (UAV) navigation in outdoor flight. Due to the possible interference and blocking of GPS signals, the method based on image retrieval, which is less likely to be interfered with, has received extensive attention in recent years. The geo-localization of UAVs and satellites can be achieved by querying pre-obtained satellite images with GPS-tagged and drone images from different perspectives. In this paper, an image transformation technique is used to extract cross-view geo-localization information from UAVs and satellites. A single-stage training method in UAV and satellite geo-localization is first proposed, which simultaneously realizes cross-view feature extraction and image retrieval, and achieves higher accuracy than existing multi-stage training techniques. A novel piecewise soft-margin triplet loss function is designed to avoid model parameters being trapped in suboptimal sets caused by the lack of constraint on positive and negative samples. The results illustrate that the proposed loss function enhances image retrieval accuracy and realizes a better convergence. Moreover, a data augmentation method for satellite images is proposed to overcome the disproportionate numbers of image samples. On the benchmark University-1652, the proposed method achieves the state-of-the-art result with a 6.67% improvement in recall rate (R@1) and 6.13% in average precision (AP). All codes will be publicized to promote reproducibility.
引用
收藏
页数:20
相关论文
共 58 条
[31]  
Sherstjuk V, 2018, 2018 IEEE 38TH INTERNATIONAL CONFERENCE ON ELECTRONICS AND NANOTECHNOLOGY (ELNANO), P663, DOI 10.1109/ELNANO.2018.8477527
[32]  
Shi Y., 2019, P 33 C NEUR INF PROC
[33]   Accurate 3-DoF Camera Geo-Localization via Ground-to-Satellite Image Matching [J].
Shi, Yujiao ;
Yu, Xin ;
Liu, Liu ;
Campbell, Dylan ;
Koniusz, Piotr ;
Li, Hongdong .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (03) :2682-2697
[34]  
Shi YJ, 2020, AAAI CONF ARTIF INTE, V34, P11990
[35]   A survey on Image Data Augmentation for Deep Learning [J].
Shorten, Connor ;
Khoshgoftaar, Taghi M. .
JOURNAL OF BIG DATA, 2019, 6 (01)
[36]  
Simonyan K, 2015, Arxiv, DOI arXiv:1409.1556
[37]   Video Google: A text retrieval approach to object matching in videos [J].
Sivic, J ;
Zisserman, A .
NINTH IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOLS I AND II, PROCEEDINGS, 2003, :1470-+
[38]  
Thomas J., 2012, 2012 IEEE Workshop on Applications of Computer Vision (WACV), P385, DOI 10.1109/WACV.2012.6163047
[39]  
Tian X., 2022, P 2022 IEEE INT C MU, P1, DOI [10.1109/ICME52920.2022.9859992, DOI 10.1109/ICME52920.2022.9859992]
[40]   UAV-Satellite View Synthesis for Cross-View Geo-Localization [J].
Tian, Xiaoyang ;
Shao, Jie ;
Ouyang, Deqiang ;
Shen, Heng Tao .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (07) :4804-4815