MixVPR: Feature Mixing for Visual Place Recognition

被引:73
作者
Ali-bey, Amar [1 ]
Chaib-draa, Brahim [1 ]
Giguere, Philippe [1 ]
机构
[1] Univ Laval, Quebec City, PQ, Canada
来源
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV) | 2023年
关键词
MODEL;
D O I
10.1109/WACV56688.2023.00301
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual Place Recognition (VPR) is a crucial part of mobile robotics and autonomous driving as well as other computer vision tasks. It refers to the process of identifying a place depicted in a query image using only computer vision. At large scale, repetitive structures, weather and illumination changes pose a real challenge, as appearances can drastically change over time. Along with tackling these challenges, an efficient VPR technique must also be practical in real-world scenarios where latency matters. To address this, we introduce MixVPR, a new holistic feature aggregation technique that takes feature maps from pre-trained backbones as a set of global features. Then, it incorporates a global relationship between elements in each feature map in a cascade of feature mixing, eliminating the need for local or pyramidal aggregation as done in NetVLAD or TransVPR. We demonstrate the effectiveness of our technique through extensive experiments on multiple large-scale benchmarks. Our method outperforms all existing techniques by a large margin while having less than half the number of parameters compared to CosPlace and NetVLAD. We achieve a new all-time high recall@1 score of 94.6% on Pitts250k-test, 88.0% on MapillarySLS, and more importantly, 58.4% on Nordland. Finally, our method outperforms two-stage retrieval techniques such as Patch-NetVLAD, TransVPR and SuperGLUE all while being orders of magnitude faster.
引用
收藏
页码:2997 / 3006
页数:10
相关论文
共 50 条
  • [21] Visual Scene-Aware Hybrid and Multi-Modal Feature Aggregation for Facial Expression Recognition
    Lee, Min Kyu
    Kim, Dae Ha
    Song, Byung Cheol
    SENSORS, 2020, 20 (18) : 1 - 24
  • [22] Place recognition using batlike sonar
    Vanderelst, Dieter
    Steckel, Jan
    Boen, Andre
    Peremans, Herbert
    Holderied, Marc W.
    ELIFE, 2016, 5
  • [23] VPR-Bench: An Open-Source Visual Place Recognition Evaluation Framework with Quantifiable Viewpoint and Appearance Change
    Zaffar, Mubariz
    Garg, Sourav
    Milford, Michael
    Kooij, Julian
    Flynn, David
    McDonald-Maier, Klaus
    Ehsan, Shoaib
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (07) : 2136 - 2174
  • [24] ConvSequential-SLAM: A Sequence-Based, Training-Less Visual Place Recognition Technique for Changing Environments
    Tomita, Mihnea-Alexandru
    Zaffar, Mubariz
    Milford, Michael J.
    Mcdonald-Maier, Klaus D.
    Ehsan, Shoaib
    IEEE ACCESS, 2021, 9 : 118673 - 118683
  • [25] Customizing the feature modulation for visual tracking
    Zhang, Yuping
    Yang, Zepeng
    Ma, Bo
    Wu, Jiahao
    Jin, Fusheng
    VISUAL COMPUTER, 2024, 40 (09) : 6547 - 6566
  • [26] Visual Saliency of Character Feature in an Image
    Nagashima, Taira
    Takano, Hironobu
    Nakamura, Kiyomi
    2015 4TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION ICIEV 15, 2015,
  • [27] Gabor Surface Feature for Face Recognition
    Yan, Ke
    Chen, Youbin
    Zhang, David
    2011 FIRST ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR), 2011, : 288 - 292
  • [28] Joint Feature Learning for Face Recognition
    Lu, Jiwen
    Liong, Venice Erin
    Wang, Gang
    Moulin, Pierre
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2015, 10 (07) : 1371 - 1383
  • [29] Visual Recognition of Permuted Words
    Rashid, Sheikh Faisal
    Shafait, Faisal
    Breuel, Thomas M.
    HUMAN VISION AND ELECTRONIC IMAGING XV, 2010, 7527
  • [30] Divided spatial attention and feature-mixing errors
    Golomb, Julie D.
    ATTENTION PERCEPTION & PSYCHOPHYSICS, 2015, 77 (08) : 2562 - 2569