Joint Representation Learning and Keypoint Detection for Cross-View Geo-Localization

被引:78
作者
Lin, Jinliang [1 ]
Zheng, Zhedong [2 ]
Zhong, Zhun [3 ]
Luo, Zhiming [1 ]
Li, Shaozi [1 ]
Yang, Yi [4 ]
Sebe, Nicu [3 ]
机构
[1] Xiamen Univ, Dept Artificial Intelligence, Xiamen 361005, Peoples R China
[2] Natl Univ Singapore, NExT, Sch Comp, Singapore 118404, Singapore
[3] Univ Trento, Dept Informat Engn & Comp Sci DISI, I-38123 Trento, Italy
[4] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Convolution; Task analysis; Location awareness; Visualization; Visual systems; Representation learning; Geo-localization; representation learning; keypoint; attention; ATTENTION; NETWORK; SCALE;
D O I
10.1109/TIP.2022.3175601
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we study the cross-view geo-localization problem to match images from different viewpoints. The key motivation underpinning this task is to learn a discriminative viewpoint-invariant visual representation. Inspired by the human visual system for mining local patterns, we propose a new framework called RK-Net to jointly learn the discriminative Representation and detect salient Keypoints with a single Network. Specifically, we introduce a Unit Subtraction Attention Module (USAM) that can automatically discover representative keypoints from feature maps and draw attention to the salient regions. USAM contains very few learning parameters but yields significant performance improvement and can be easily plugged into different networks. We demonstrate through extensive experiments that (1) by incorporating USAM, RK-Net facilitates end-to-end joint learning without the prerequisite of extra annotations. Representation learning and keypoint detection are two highly-related tasks. Representation learning aids keypoint detection. Keypoint detection, in turn, enriches the model capability against large appearance changes caused by viewpoint variants. (2) USAM is easy to implement and can be integrated with existing methods, further improving the state-of-the-art performance. We achieve competitive geo-localization accuracy on three challenging datasets, i. e., University-1652, CVUSA and CVACT. Our code is available at https://github.com/AggMan96/RK-Net.
引用
收藏
页码:3780 / 3792
页数:13
相关论文
共 79 条
[1]   Fast Explicit Diffusion for Accelerated Features in Nonlinear Scale Spaces [J].
Alcantarilla, Pablo F. ;
Nuevo, Jesus ;
Bartoli, Adrien .
PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2013, 2013,
[2]   KAZE Features [J].
Alcantarilla, Pablo Fernandez ;
Bartoli, Adrien ;
Davison, Andrew J. .
COMPUTER VISION - ECCV 2012, PT VI, 2012, 7577 :214-227
[3]  
Algamdi AM, 2020, IEEE IMAGE PROC, P3174, DOI 10.1109/ICIP40778.2020.9190864
[4]   Learning to Match Aerial Images with Deep Attentive Architectures [J].
Altwaijry, Hani ;
Trulls, Eduard ;
Hays, James ;
Fua, Pascal ;
Belongie, Serge .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3539-3547
[5]   2D Human Pose Estimation: New Benchmark and State of the Art Analysis [J].
Andriluka, Mykhaylo ;
Pishchulin, Leonid ;
Gehler, Peter ;
Schiele, Bernt .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :3686-3693
[6]  
[Anonymous], 1988, P ALV VIS C
[7]  
Arandjelovic R, 2018, IEEE T PATTERN ANAL, V40, P1437, DOI [10.1109/CVPR.2016.572, 10.1109/TPAMI.2017.2711011]
[8]  
Bansal M., 2011, P 19 ACM INT C MULT, P1125
[9]  
Bay Herbert, 2008, COMPUT VIS IMAGE UND, V110, P346, DOI [10.1016/j.cviu.2007.09.014, DOI 10.1016/j.cviu.2007.09.014]
[10]   Ground-to-Aerial Image Geo-Localization ith a Hard Exemplar Reweighting Triplet Loss [J].
Cai, Sudong ;
Guo, Yulan ;
Khan, Salman ;
Hu, Jiwei ;
Wen, Gongjian .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :8390-8399