MatchFormer: Interleaving Attention in Transformers for Feature Matching

被引:24
|
作者
Wang, Qing [1 ]
Zhang, Jiaming [1 ]
Yang, Kailun [1 ]
Peng, Kunyu [1 ]
Stiefelhagen, Rainer [1 ]
机构
[1] Karlsruhe Inst Technol, Karlsruhe, Germany
来源
COMPUTER VISION - ACCV 2022, PT III | 2023年 / 13843卷
关键词
Feature matching; Vision transformers;
D O I
10.1007/978-3-031-26313-2_16
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Local feature matching is a computationally intensive task at the subpixel level. While detector-based methods coupled with feature descriptors struggle in low-texture scenes, CNN-based methods with a sequential extract-to-match pipeline, fail to make use of the matching capacity of the encoder and tend to overburden the decoder for matching. In contrast, we propose a novel hierarchical extract-and-match transformer, termed as MatchFormer. Inside each stage of the hierarchical encoder, we interleave self-attention for feature extraction and cross-attention for feature matching, yielding a human-intuitive extract-and-match scheme. Such a match-aware encoder releases the overloaded decoder and makes the model highly efficient. Further, combining self- and cross-attention on multi-scale features in a hierarchical architecture improves matching robustness, particularly in low-texture indoor scenes or with less outdoor training data. Thanks to such a strategy, MatchFormer is a multi-win solution in efficiency, robustness, and precision. Compared to the previous best method in indoor pose estimation, our lite MatchFormer has only 45% GFLOPs, yet achieves a +1.3% precision gain and a 41% running speed boost. The large MatchFormer reaches state-of-the-art on four different benchmarks, including indoor pose estimation (ScanNet), outdoor pose estimation (MegaDepth), homography estimation and image matching (HPatch), and visual localization (InLoc).
引用
收藏
页码:256 / 273
页数:18
相关论文
共 50 条
  • [11] Self-attention in vision transformers performs perceptual grouping, not attention
    Mehrani, Paria
    Tsotsos, John K.
    FRONTIERS IN COMPUTER SCIENCE, 2023, 5
  • [12] Feature matching method: Sparse feature tree
    Department of Computer Science and Engineering, Fudan University, Shanghai 200433, China
    Ruan Jian Xue Bao, 2006, 5 (1026-1033): : 1026 - 1033
  • [13] Multi-Manifold Attention for Vision Transformers
    Konstantinidis, Dimitrios
    Papastratis, Ilias
    Dimitropoulos, Kosmas
    Daras, Petros
    IEEE ACCESS, 2023, 11 : 123433 - 123444
  • [14] MSGA-Net: Progressive Feature Matching via Multi-Layer Sparse Graph Attention
    Gong, Zhepeng
    Xiao, Guobao
    Shi, Ziwei
    Chen, Riqing
    Yu, Jun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 5765 - 5775
  • [15] Feature Matching and Position Matching Between Optical and SAR With Local Deep Feature Descriptor
    Liao, Yun
    Di, Yide
    Zhou, Hao
    Li, Anran
    Liu, Junhui
    Lu, Mingyu
    Duan, Qing
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2022, 15 : 448 - 462
  • [16] FEATURE MATCHING IN GROWING DATABASES
    Pires, Bernardo Rodrigues
    Moura, Jose M. F.
    2012 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2012), 2012, : 1913 - 1916
  • [17] PROGRESSIVE FILTERING FOR FEATURE MATCHING
    Jiang, Xingyu
    Ma, Jiayi
    Chen, Jun
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 2217 - 2221
  • [18] Feature Descriptor Learning Based on Sparse Feature Matching
    Song, Dengpan
    Liu, Shiyuan
    Kang, Ruirui
    Ai, Danni
    2021 THE 5TH INTERNATIONAL CONFERENCE ON VIDEO AND IMAGE PROCESSING, ICVIP 2021, 2021, : 62 - 68
  • [19] AMatFormer: Efficient Feature Matching via Anchor Matching Transformer
    Jiang, Bo
    Luo, Shuxian
    Wang, Xiao
    Li, Chuanfu
    Tang, Jin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 1504 - 1515
  • [20] A Feature Map Adversarial Attack Against Vision Transformers
    Altoub, Majed
    Mehmood, Rashid
    AlQurashi, Fahad
    Alqahtany, Saad
    Alsulami, Bassma
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (10) : 962 - 968