MatchFormer: Interleaving Attention in Transformers for Feature Matching

被引:24
作者
Wang, Qing [1 ]
Zhang, Jiaming [1 ]
Yang, Kailun [1 ]
Peng, Kunyu [1 ]
Stiefelhagen, Rainer [1 ]
机构
[1] Karlsruhe Inst Technol, Karlsruhe, Germany
来源
COMPUTER VISION - ACCV 2022, PT III | 2023年 / 13843卷
关键词
Feature matching; Vision transformers;
D O I
10.1007/978-3-031-26313-2_16
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Local feature matching is a computationally intensive task at the subpixel level. While detector-based methods coupled with feature descriptors struggle in low-texture scenes, CNN-based methods with a sequential extract-to-match pipeline, fail to make use of the matching capacity of the encoder and tend to overburden the decoder for matching. In contrast, we propose a novel hierarchical extract-and-match transformer, termed as MatchFormer. Inside each stage of the hierarchical encoder, we interleave self-attention for feature extraction and cross-attention for feature matching, yielding a human-intuitive extract-and-match scheme. Such a match-aware encoder releases the overloaded decoder and makes the model highly efficient. Further, combining self- and cross-attention on multi-scale features in a hierarchical architecture improves matching robustness, particularly in low-texture indoor scenes or with less outdoor training data. Thanks to such a strategy, MatchFormer is a multi-win solution in efficiency, robustness, and precision. Compared to the previous best method in indoor pose estimation, our lite MatchFormer has only 45% GFLOPs, yet achieves a +1.3% precision gain and a 41% running speed boost. The large MatchFormer reaches state-of-the-art on four different benchmarks, including indoor pose estimation (ScanNet), outdoor pose estimation (MegaDepth), homography estimation and image matching (HPatch), and visual localization (InLoc).
引用
收藏
页码:256 / 273
页数:18
相关论文
共 50 条
  • [31] Exploiting SLAM to Improve Feature Matching
    Hajebi, Kiana
    Zhang, Hong
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2011, : 3107 - 3112
  • [32] A Novel Feature Matching Method for Matching OpenStreetMap Buildings with Those of Reference Dataset
    Moradi, Milad
    Roche, Stephane
    Mostafavi, Mir Abolfazl
    [J]. WEB AND WIRELESS GEOGRAPHICAL INFORMATION SYSTEMS, W2GIS 2023, 2023, 13912 : 139 - 152
  • [33] Feature matching based on unsupervised manifold alignment
    Yan, Weidong
    Tian, Zheng
    Duan, Xifa
    Pan, Lulu
    [J]. MACHINE VISION AND APPLICATIONS, 2013, 24 (05) : 983 - 994
  • [34] Feature matching constrained by cross ratio invariance
    Branca, A
    Stella, E
    Distante, A
    [J]. PATTERN RECOGNITION, 2000, 33 (03) : 465 - 481
  • [35] Robust Feature Matching via Local Consensus
    Chen, Jun
    Yang, Meng
    Peng, Chengli
    Luo, Linbo
    Gong, Wenping
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [36] Based on statistics of the gradients the feature matching algorithm
    Guo, Jidong
    Li, XueQing
    [J]. PROCEEDINGS OF THE FIRST INTERNATIONAL WORKSHOP ON EDUCATION TECHNOLOGY AND COMPUTER SCIENCE, VOL II, 2009, : 983 - 987
  • [37] User-Assisted Feature Correspondence Matching
    Ring, Dan
    Kokaram, Anil
    [J]. 2009 CONFERENCE FOR VISUAL MEDIA PRODUCTION: CVMP 2009, 2009, : 214 - 219
  • [38] Reconstructing shredded documents through feature matching
    Justino, Edson
    Oliveira, Luiz S.
    Freitas, Cinthia
    [J]. FORENSIC SCIENCE INTERNATIONAL, 2006, 160 (2-3) : 140 - 147
  • [39] Local Image Feature Matching for Object Recognition
    Sushkov, Oleg O.
    Sammut, Claude
    [J]. 11TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV 2010), 2010, : 1598 - 1604
  • [40] A probabilistic optimization approach to SAR feature matching
    Ettinger, GJ
    Klanderman, GA
    Wells, WM
    Grimson, EL
    [J]. ALGORITHMS FOR SYNTHETIC APERTURE RADAR IMAGERY III, 1996, 2757 : 318 - 329