Where in the World Is This Image? Transformer-Based Geo-localization in the Wild

被引：12

作者：

Pramanick, Shraman ^{[1
]}

Nowara, Ewa M. ^{[1
]}

Gleason, Joshua ^{[2
]}

Castillo, Carlos D. ^{[1
]}

Chellappa, Rama ^{[1
]}

机构：

[1] Johns Hopkins Univ, Baltimore, MD 21218 USA

[2] Univ Maryland, College Pk, MD 20742 USA

来源：

COMPUTER VISION, ECCV 2022, PT XXXVIII | 2022年 / 13698卷

关键词：

Geo-location estimation; Vision transformer; Multi-task learning; Semantic segmentation;

D O I：

10.1007/978-3-031-19839-7_12

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Predicting the geographic location (geo-localization) from a single ground-level RGB image taken anywhere in the world is a very challenging problem. The challenges include huge diversity of images due to different environmental scenarios, drastic variation in the appearance of the same location depending on the time of the day, weather, season, and more importantly, the prediction is made from a single image possibly having only a few geo-locating cues. For these reasons, most existing works are restricted to specific cities, imagery, or worldwide landmarks. In this work, we focus on developing an efficient solution to planet-scale single-image geo-localization. To this end, we proposeTransLocator, a unified dual-branch transformer network that attends to tiny details over the entire image and produces robust feature representation under extreme appearance variations. TransLocator takes an RGB image and its semantic segmentationmap as inputs, interacts between its two parallel branches after each transformer layer and simultaneously performs geo-localization and scene recognition in a multi-task fashion. We evaluate TransLocator on four benchmark datasets - Im2GPS, Im2GPS3k, YFCC4k, YFCC26k and obtain 5.5%, 14.1%, 4.9%, 9.9% continent-level accuracy improvement over the state-of-the-art. TransLocator is also validated on real-world test images and found to be more effective than previous methods.

引用

页码：196 / 215

页数：20

共 79 条

[1]

Akbari H, 2021, ADV NEUR IN

[2]

Baatz G, 2012, LECT NOTES COMPUT SC, V7573, P517, DOI 10.1007/978-3-642-33709-3_37

[3] Rethinking Visual Geo-localization for Large-Scale Applications [J].

Berton, Gabriele ;

Masone, Carlo ;

Caputo, Barbara .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :4868-4878

[4] State-of-the-art in visual geo-localization [J].

Brejcha, Jan ;

Cadik, Martin .

PATTERN ANALYSIS AND APPLICATIONS, 2017, 20 (03) :613-637

[5]

Cao Liangliang, 2012, P 21 INT C WORLD WID, P469, DOI DOI 10.1145/2187980.2188081

[6] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[7] Multitask learning [J].

Caruana, R .

MACHINE LEARNING, 1997, 28 (01) :41-75

[8]

Chen DM, 2011, PROC CVPR IEEE, P737, DOI 10.1109/CVPR.2011.5995610

[9] Pre-Trained Image Processing Transformer [J].

Chen, Hanting ;

Wang, Yunhe ;

Guo, Tianyu ;

Xu, Chang ;

Deng, Yiping ;

Liu, Zhenhua ;

Ma, Siwei ;

Xu, Chunjing ;

Xu, Chao ;

Gao, Wen .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :12294-12305

[10]

Cheng B., 2021, P IEEE C COMPUTER VI

← 1 2 3 4 5 6 7 8 →