MS-NetVLAD: Multi-Scale NetVLAD for Visual Place Recognition

被引:1
作者
Uggi, Anuradha [1 ]
Channappayya, Sumohana S. [1 ]
机构
[1] IIT Hyderabad, Dept Elect Engn, Kandi 502284, India
关键词
Visualization; Image recognition; Transforms; Contrastive learning; Benchmark testing; Feature extraction; Vectors; Image matching; visual place recognition; scale invariance; NetVLAD;
D O I
10.1109/LSP.2024.3425279
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Many successful Visual Place Recognition (VPR) techniques operate in a contrastive learning framework using features extracted from a Convolutional Neural Network (CNN) backbone. Among these, the NetVLAD is a popular framework that transforms the classical Vector of Locally Aggregated Descriptors (VLAD) method into a modern data-driven model. Introducing learnability in VLAD has led to several variants of NetVLAD, such as Patch-NetVLAD. However, many of these use only the bottleneck features of the backbone model, ignoring the rest of the feature hierarchy. A few state-of-the-art models adopt complex architectures to improve the quality of features. In this letter, we propose a simple extension to the NetVLAD that leverages the feature representations from intermediate layers of the CNN backbone in addition to the bottleneck features. We conduct extensive experiments to demonstrate the significance of these intermediate features for VPR. The proposed method, which we call Multi-Scale-NetVLAD (MS-NetVLAD), surpasses the successful NetVLAD and Patch-NetVLAD models by a significant margin. We demonstrate consistent performance improvements on large-scale VPR benchmarks, including Pittsburgh 30 k, Tokyo 24/7, Nordland, and MSLS. This improvement is attributed to the complementary multi-scale features employed by MS-NetVLAD. Importantly, this work reinforces the inherent strength of the NetVLAD framework for VPR. Further, MS-NetVLAD is shown to be competitive with state-of-the-art VPR models such as MixVPR and R2Former.
引用
收藏
页码:1855 / 1859
页数:5
相关论文
共 33 条
  • [1] MixVPR: Feature Mixing for Visual Place Recognition
    Ali-bey, Amar
    Chaib-draa, Brahim
    Giguere, Philippe
    [J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2997 - 3006
  • [2] GSV-CITIES: Toward appropriate supervised visual place recognition
    Ali-bey, Amar
    Chaib-draa, Brahim
    Giguere, Philippe
    [J]. NEUROCOMPUTING, 2022, 513 : 194 - 203
  • [3] [Anonymous], 2016, Int. J. Comput. Vis., V120, P1470
  • [4] [Anonymous], 2021, Int. J. Comput. Vis., V129, P2136
  • [5] Arandjelovic R, 2018, IEEE T PATTERN ANAL, V40, P1437, DOI [10.1109/TPAMI.2017.2711011, 10.1109/CVPR.2016.572]
  • [6] Speeded-Up Robust Features (SURF)
    Bay, Herbert
    Ess, Andreas
    Tuytelaars, Tinne
    Van Gool, Luc
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2008, 110 (03) : 346 - 359
  • [7] Berton G., 2023, IEEE INT C COMPUTER, P11080
  • [8] Rethinking Visual Geo-localization for Large-Scale Applications
    Berton, Gabriele
    Masone, Carlo
    Caputo, Barbara
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4868 - 4878
  • [9] Garg S, 2021, PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, P4416
  • [10] SeqNet: Learning Descriptors for Sequence-Based Hierarchical Place Recognition
    Garg, Sourav
    Milford, Michael
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (03) : 4305 - 4312