MatchFormer: Interleaving Attention in Transformers for Feature Matching

被引:24
|
作者
Wang, Qing [1 ]
Zhang, Jiaming [1 ]
Yang, Kailun [1 ]
Peng, Kunyu [1 ]
Stiefelhagen, Rainer [1 ]
机构
[1] Karlsruhe Inst Technol, Karlsruhe, Germany
来源
COMPUTER VISION - ACCV 2022, PT III | 2023年 / 13843卷
关键词
Feature matching; Vision transformers;
D O I
10.1007/978-3-031-26313-2_16
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Local feature matching is a computationally intensive task at the subpixel level. While detector-based methods coupled with feature descriptors struggle in low-texture scenes, CNN-based methods with a sequential extract-to-match pipeline, fail to make use of the matching capacity of the encoder and tend to overburden the decoder for matching. In contrast, we propose a novel hierarchical extract-and-match transformer, termed as MatchFormer. Inside each stage of the hierarchical encoder, we interleave self-attention for feature extraction and cross-attention for feature matching, yielding a human-intuitive extract-and-match scheme. Such a match-aware encoder releases the overloaded decoder and makes the model highly efficient. Further, combining self- and cross-attention on multi-scale features in a hierarchical architecture improves matching robustness, particularly in low-texture indoor scenes or with less outdoor training data. Thanks to such a strategy, MatchFormer is a multi-win solution in efficiency, robustness, and precision. Compared to the previous best method in indoor pose estimation, our lite MatchFormer has only 45% GFLOPs, yet achieves a +1.3% precision gain and a 41% running speed boost. The large MatchFormer reaches state-of-the-art on four different benchmarks, including indoor pose estimation (ScanNet), outdoor pose estimation (MegaDepth), homography estimation and image matching (HPatch), and visual localization (InLoc).
引用
收藏
页码:256 / 273
页数:18
相关论文
共 50 条
  • [1] FmCFA: a feature matching method for critical feature attention in multimodal images
    Liao, Yun
    Wu, Xuning
    Liu, Junhui
    Liu, Peiyu
    Pan, Zhixuan
    Duan, Qing
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [2] Transformer With Linear-Window Attention for Feature Matching
    Shen, Zhiwei
    Kong, Bin
    Dong, Xiaoyu
    IEEE ACCESS, 2023, 11 : 121202 - 121211
  • [3] Improving sparse graph attention for feature matching by informative keypoints exploration
    Jiang, Xingyu
    Zhang, Shihua
    Zhang, Xiao-Ping
    Ma, Jiayi
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 235
  • [4] A Hierarchical Consensus Attention Network for Feature Matching of Remote Sensing Images
    Chen, Shuang
    Chen, Jiaxuan
    Rao, Yujing
    Chen, Xiaoxian
    Fan, Xiaoyan
    Bai, Haicheng
    Xing, Lin
    Zhou, Chengjiang
    Yang, Yang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [5] AAM-ORB: affine attention module on ORB for conditioned feature matching
    Song, Shaojing
    Ai, Luxia
    Tang, Pan
    Miao, Zhiqing
    Gu, Yang
    Chai, Yu
    SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (05) : 2351 - 2358
  • [6] Meta network attention-based feature matching for heterogeneous defect prediction
    Nevendra, Meetesh
    Singh, Pradeep
    AUTOMATED SOFTWARE ENGINEERING, 2025, 32 (01)
  • [7] AAM-ORB: affine attention module on ORB for conditioned feature matching
    Shaojing Song
    Luxia Ai
    Pan Tang
    Zhiqing Miao
    Yang Gu
    Yu Chai
    Signal, Image and Video Processing, 2023, 17 : 2351 - 2358
  • [8] Single image super-resolution based on trainable feature matching attention network
    Chen, Qizhou
    Shao, Qing
    PATTERN RECOGNITION, 2024, 149
  • [9] Feature vector field and feature matching
    Wu, F. C.
    Wang, Z. H.
    Wang, X. G.
    PATTERN RECOGNITION, 2010, 43 (10) : 3273 - 3281
  • [10] An improved matching algorithm for feature points matching
    Yan Yuanhui
    Xia Haiying
    Huang Siqi
    Xiao Wenjing
    2014 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMMUNICATIONS AND COMPUTING (ICSPCC), 2014, : 292 - 296