MS-NetVLAD: Multi-Scale NetVLAD for Visual Place Recognition

被引：1

作者：

Uggi, Anuradha ^{[1
]}

Channappayya, Sumohana S. ^{[1
]}

机构：

[1] IIT Hyderabad, Dept Elect Engn, Kandi 502284, India

来源：

IEEE SIGNAL PROCESSING LETTERS | 2024年 / 31卷

关键词：

Visualization; Image recognition; Transforms; Contrastive learning; Benchmark testing; Feature extraction; Vectors; Image matching; visual place recognition; scale invariance; NetVLAD;

D O I：

10.1109/LSP.2024.3425279

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Many successful Visual Place Recognition (VPR) techniques operate in a contrastive learning framework using features extracted from a Convolutional Neural Network (CNN) backbone. Among these, the NetVLAD is a popular framework that transforms the classical Vector of Locally Aggregated Descriptors (VLAD) method into a modern data-driven model. Introducing learnability in VLAD has led to several variants of NetVLAD, such as Patch-NetVLAD. However, many of these use only the bottleneck features of the backbone model, ignoring the rest of the feature hierarchy. A few state-of-the-art models adopt complex architectures to improve the quality of features. In this letter, we propose a simple extension to the NetVLAD that leverages the feature representations from intermediate layers of the CNN backbone in addition to the bottleneck features. We conduct extensive experiments to demonstrate the significance of these intermediate features for VPR. The proposed method, which we call Multi-Scale-NetVLAD (MS-NetVLAD), surpasses the successful NetVLAD and Patch-NetVLAD models by a significant margin. We demonstrate consistent performance improvements on large-scale VPR benchmarks, including Pittsburgh 30 k, Tokyo 24/7, Nordland, and MSLS. This improvement is attributed to the complementary multi-scale features employed by MS-NetVLAD. Importantly, this work reinforces the inherent strength of the NetVLAD framework for VPR. Further, MS-NetVLAD is shown to be competitive with state-of-the-art VPR models such as MixVPR and R2Former.

引用

页码：1855 / 1859

页数：5

共 33 条

[1] MixVPR: Feature Mixing for Visual Place Recognition
Ali-bey, Amar
Chaib-draa, Brahim
Giguere, Philippe
[J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2997 - 3006
[2] GSV-CITIES: Toward appropriate supervised visual place recognition
Ali-bey, Amar
Chaib-draa, Brahim
Giguere, Philippe
[J]. NEUROCOMPUTING, 2022, 513 : 194 - 203
[3] [Anonymous], 2016, Int. J. Comput. Vis., V120, P1470
[4] [Anonymous], 2021, Int. J. Comput. Vis., V129, P2136
[5] Arandjelovic R, 2018, IEEE T PATTERN ANAL, V40, P1437, DOI [10.1109/TPAMI.2017.2711011, 10.1109/CVPR.2016.572]
[6] Speeded-Up Robust Features (SURF)
Bay, Herbert
Ess, Andreas
Tuytelaars, Tinne
Van Gool, Luc
[J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2008, 110 (03) : 346 - 359
[7] Berton G., 2023, IEEE INT C COMPUTER, P11080
[8] Rethinking Visual Geo-localization for Large-Scale Applications
Berton, Gabriele
Masone, Carlo
Caputo, Barbara
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4868 - 4878
[9] Garg S, 2021, PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, P4416
[10] SeqNet: Learning Descriptors for Sequence-Based Hierarchical Place Recognition
Garg, Sourav
Milford, Michael
[J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (03) : 4305 - 4312

← 1 2 3 4 →