Robust Visual Place Recognition for Severe Appearance Changes

被引:0
作者
Jiang, Haiyang [1 ]
Piao, Songhao [1 ]
Yu, Huai [2 ]
Li, Wei [3 ]
Yu, Lei [2 ]
机构
[1] Harbin Inst Technol, Multiagent Robot Res Ctr, Dept Fac Comp, Harbin 150001, Peoples R China
[2] Wuhan Univ, Sch Elect Informat, Wuhan 430072, Peoples R China
[3] Soochow Univ, Suzhou 215031, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Training; Transformers; Pipelines; Costs; Encoding; Visualization; Geometric verification; global retrieval; reranking; visual place recognition;
D O I
10.1109/LRA.2024.3376967
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Severe appearance changes represent a pervasive and intricate challenge within Visual Place Recognition (VPR) tasks, and the current best solution adopts a composite strategy encompassing global retrieval and reranking. However, these reranking techniques necessitate sophisticated considerations to extract and match local features, which leads to a notable escalation of computational resource demands and inference duration. To this end, we propose a novel framework unifying global and local features within a single pipeline network, representing a simple solution capable of seamlessly operating across diverse scenarios without other fussy structures. Specifically, our overall thought involves training discriminative global features via image classification techniques, concurrently extracting effective local features directly from the intermediate layers without extra operations. To augment the expressiveness of features, we introduce multi-layer Convolutional Neural Network (CNN) feature maps to fuse diverse semantic information. Concurrently, a Transformer with relative position encoding is employed to capture cross-layer long-range and positional correlations. In conjunction with the associated attention values, low-resolution feature maps lessen features involved in the matching, resulting in decreased computational overhead and a remarkable acceleration of reranking. Extensive experimentations showcase that our model achieves State-Of-The-Art (SOTA) performance across datasets with severe appearance changes, the fastest inference duration and minimal memory usage.
引用
收藏
页码:4289 / 4296
页数:8
相关论文
共 39 条
[1]  
Ali-Bey A., 2023, arXiv
[2]   MixVPR: Feature Mixing for Visual Place Recognition [J].
Ali-bey, Amar ;
Chaib-draa, Brahim ;
Giguere, Philippe .
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, :2997-3006
[3]   GSV-CITIES: Toward appropriate supervised visual place recognition [J].
Ali-bey, Amar ;
Chaib-draa, Brahim ;
Giguere, Philippe .
NEUROCOMPUTING, 2022, 513 :194-203
[4]  
Arandjelovic R, 2018, IEEE T PATTERN ANAL, V40, P1437, DOI [10.1109/TPAMI.2017.2711011, 10.1109/CVPR.2016.572]
[5]  
Barbarani G., 2023, P IEEECVF C COMPUTER, P6154
[6]  
Berton G., 2023, P IEEE INT C COMPUTE, P11080
[7]   Rethinking Visual Geo-localization for Large-Scale Applications [J].
Berton, Gabriele ;
Masone, Carlo ;
Caputo, Barbara .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :4868-4878
[8]   Unifying Deep Local and Global Features for Image Search [J].
Cao, Bingyi ;
Araujo, Andre ;
Sim, Jack .
COMPUTER VISION - ECCV 2020, PT XX, 2020, 12365 :726-743
[9]   Learning Context Flexible Attention Model for Long-Term Visual Place Recognition [J].
Chen, Zetao ;
Liu, Lingqiao ;
Sa, Inkyu ;
Ge, Zongyuan ;
Chli, Margarita .
IEEE ROBOTICS AND AUTOMATION LETTERS, 2018, 3 (04) :4015-4022
[10]   Multi-Context Attention for Human Pose Estimation [J].
Chu, Xiao ;
Yang, Wei ;
Ouyang, Wanli ;
Ma, Cheng ;
Yuille, Alan L. ;
Wang, Xiaogang .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5669-5678