Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching

被引:141
作者
Shi, Yujiao [1 ,2 ]
Yu, Xin [1 ,2 ,3 ]
Campbell, Dylan [1 ,2 ]
Li, Hongdong [1 ,2 ]
机构
[1] Australian Natl Univ, Canberra, ACT, Australia
[2] Australian Ctr Robot Vis, Brisbane, Qld, Australia
[3] Univ Technol Sydney, Sydney, NSW, Australia
来源
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2020年
基金
澳大利亚研究理事会;
关键词
D O I
10.1109/CVPR42600.2020.00412
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-view geo-localization is the problem of estimating the position and orientation (latitude, longitude and azimuth angle) of a camera at ground level given a large-scale database of geo-tagged aerial (e.g., satellite) images. Existing approaches treat the task as a pure location estimation problem by learning discriminative feature descriptors, but neglect orientation alignment. It is well-recognized that knowing the orientation between ground and aerial images can significantly reduce matching ambiguity between these two views, especially when the ground-level images have a limited Field of View (FoV) instead of a full field-of-view panorama. Therefore, we design a Dynamic Similarity Matching network to estimate cross-view orientation alignment during localization. In particular, we address the cross-view domain gap by applying a polar transform to the aerial images to approximately align the images up to an unknown azimuth angle. Then, a two-stream convolutional network is used to learn deep features from the ground and polar-transformed aerial images. Finally, we obtain the orientation by computing the correlation between cross-view features, which also provides a more accurate measure of feature similarity, improving location recall. Experiments on standard datasets demonstrate that our method significantly improves state-of-the-art performance. Remarkably, we improve the top-I location recall rate on the CVUSA dataset by a factor of 1.5 x for panoramas with known orientation, by a factor of 3.3 x for panoramas with unknown orientation, and by a factor of 6 x for 180 degrees -FoV images with unknown orientation.
引用
收藏
页码:4063 / 4071
页数:9
相关论文
共 22 条
[1]  
[Anonymous], 2015, ACS SYM SER
[2]  
[Anonymous], 2019, ARXIV190406281
[3]   NetVLAD: CNN architecture for weakly supervised place recognition [J].
Arandjelovic, Relja ;
Gronat, Petr ;
Torii, Akihiko ;
Pajdla, Tomas ;
Sivic, Josef .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :5297-5307
[4]   Ground-to-Aerial Image Geo-Localization ith a Hard Exemplar Reweighting Triplet Loss [J].
Cai, Sudong ;
Guo, Yulan ;
Khan, Salman ;
Hu, Jiwei ;
Wen, Gongjian .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :8390-8399
[5]   Semantic Cross-View Matching [J].
Castaldo, Francesco ;
Zamir, Amir ;
Angst, Roland ;
Palmieri, Francesco ;
Savarese, Silvio .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOP (ICCVW), 2015, :1044-1052
[6]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[7]  
Hu Sixing, 2018, IEEE C COMPUTER VISI
[8]   ImageNet Classification with Deep Convolutional Neural Networks [J].
Krizhevsky, Alex ;
Sutskever, Ilya ;
Hinton, Geoffrey E. .
COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90
[9]  
Lenc K, 2015, PROC CVPR IEEE, P991, DOI 10.1109/CVPR.2015.7298701
[10]   Cross-View Image Geolocalization [J].
Lin, Tsung-Yi ;
Belongie, Serge ;
Hays, James .
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :891-898