Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching

被引：141

作者：

Shi, Yujiao ^{[1
,2
]}

Yu, Xin ^{[1
,2
,3
]}

Campbell, Dylan ^{[1
,2
]}

Li, Hongdong ^{[1
,2
]}

机构：

[1] Australian Natl Univ, Canberra, ACT, Australia

[2] Australian Ctr Robot Vis, Brisbane, Qld, Australia

[3] Univ Technol Sydney, Sydney, NSW, Australia

来源：

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2020年

基金：

澳大利亚研究理事会;

关键词：

D O I：

10.1109/CVPR42600.2020.00412

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Cross-view geo-localization is the problem of estimating the position and orientation (latitude, longitude and azimuth angle) of a camera at ground level given a large-scale database of geo-tagged aerial (e.g., satellite) images. Existing approaches treat the task as a pure location estimation problem by learning discriminative feature descriptors, but neglect orientation alignment. It is well-recognized that knowing the orientation between ground and aerial images can significantly reduce matching ambiguity between these two views, especially when the ground-level images have a limited Field of View (FoV) instead of a full field-of-view panorama. Therefore, we design a Dynamic Similarity Matching network to estimate cross-view orientation alignment during localization. In particular, we address the cross-view domain gap by applying a polar transform to the aerial images to approximately align the images up to an unknown azimuth angle. Then, a two-stream convolutional network is used to learn deep features from the ground and polar-transformed aerial images. Finally, we obtain the orientation by computing the correlation between cross-view features, which also provides a more accurate measure of feature similarity, improving location recall. Experiments on standard datasets demonstrate that our method significantly improves state-of-the-art performance. Remarkably, we improve the top-I location recall rate on the CVUSA dataset by a factor of 1.5 x for panoramas with known orientation, by a factor of 3.3 x for panoramas with unknown orientation, and by a factor of 6 x for 180 degrees -FoV images with unknown orientation.

引用

页码：4063 / 4071

页数：9

共 22 条

[1]

[Anonymous], 2015, ACS SYM SER

[2]

[Anonymous], 2019, ARXIV190406281

[3] NetVLAD: CNN architecture for weakly supervised place recognition [J].

Arandjelovic, Relja ;

Gronat, Petr ;

Torii, Akihiko ;

Pajdla, Tomas ;

Sivic, Josef .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :5297-5307

[4] Ground-to-Aerial Image Geo-Localization ith a Hard Exemplar Reweighting Triplet Loss [J].

Cai, Sudong ;

Guo, Yulan ;

Khan, Salman ;

Hu, Jiwei ;

Wen, Gongjian .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :8390-8399

[5] Semantic Cross-View Matching [J].

Castaldo, Francesco ;

Zamir, Amir ;

Angst, Roland ;

Palmieri, Francesco ;

Savarese, Silvio .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOP (ICCVW), 2015, :1044-1052

[6]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[7]

Hu Sixing, 2018, IEEE C COMPUTER VISI

[8] ImageNet Classification with Deep Convolutional Neural Networks [J].

Krizhevsky, Alex ;

Sutskever, Ilya ;

Hinton, Geoffrey E. .

COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90

[9]

Lenc K, 2015, PROC CVPR IEEE, P991, DOI 10.1109/CVPR.2015.7298701

[10] Cross-View Image Geolocalization [J].

Lin, Tsung-Yi ;

Belongie, Serge ;

Hays, James .

2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :891-898

← 1 2 3 →