MatchFormer: Interleaving Attention in Transformers for Feature Matching

被引：24

作者：

Wang, Qing ^{[1
]}

Zhang, Jiaming ^{[1
]}

Yang, Kailun ^{[1
]}

Peng, Kunyu ^{[1
]}

Stiefelhagen, Rainer ^{[1
]}

机构：

[1] Karlsruhe Inst Technol, Karlsruhe, Germany

来源：

COMPUTER VISION - ACCV 2022, PT III | 2023年 / 13843卷

关键词：

Feature matching; Vision transformers;

D O I：

10.1007/978-3-031-26313-2_16

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Local feature matching is a computationally intensive task at the subpixel level. While detector-based methods coupled with feature descriptors struggle in low-texture scenes, CNN-based methods with a sequential extract-to-match pipeline, fail to make use of the matching capacity of the encoder and tend to overburden the decoder for matching. In contrast, we propose a novel hierarchical extract-and-match transformer, termed as MatchFormer. Inside each stage of the hierarchical encoder, we interleave self-attention for feature extraction and cross-attention for feature matching, yielding a human-intuitive extract-and-match scheme. Such a match-aware encoder releases the overloaded decoder and makes the model highly efficient. Further, combining self- and cross-attention on multi-scale features in a hierarchical architecture improves matching robustness, particularly in low-texture indoor scenes or with less outdoor training data. Thanks to such a strategy, MatchFormer is a multi-win solution in efficiency, robustness, and precision. Compared to the previous best method in indoor pose estimation, our lite MatchFormer has only 45% GFLOPs, yet achieves a +1.3% precision gain and a 41% running speed boost. The large MatchFormer reaches state-of-the-art on four different benchmarks, including indoor pose estimation (ScanNet), outdoor pose estimation (MegaDepth), homography estimation and image matching (HPatch), and visual localization (InLoc).

引用

页码：256 / 273

页数：18

共 50 条

[1] FmCFA: a feature matching method for critical feature attention in multimodal images
Liao, Yun
Wu, Xuning
Liu, Junhui
Liu, Peiyu
Pan, Zhixuan
Duan, Qing
SCIENTIFIC REPORTS, 2025, 15 (01):
[2] Transformer With Linear-Window Attention for Feature Matching
Shen, Zhiwei
Kong, Bin
Dong, Xiaoyu
IEEE ACCESS, 2023, 11 : 121202 - 121211
[3] Improving sparse graph attention for feature matching by informative keypoints exploration
Jiang, Xingyu
Zhang, Shihua
Zhang, Xiao-Ping
Ma, Jiayi
COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 235
[4] A Hierarchical Consensus Attention Network for Feature Matching of Remote Sensing Images
Chen, Shuang
Chen, Jiaxuan
Rao, Yujing
Chen, Xiaoxian
Fan, Xiaoyan
Bai, Haicheng
Xing, Lin
Zhou, Chengjiang
Yang, Yang
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[5] AAM-ORB: affine attention module on ORB for conditioned feature matching
Song, Shaojing
Ai, Luxia
Tang, Pan
Miao, Zhiqing
Gu, Yang
Chai, Yu
SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (05) : 2351 - 2358
[6] Meta network attention-based feature matching for heterogeneous defect prediction
Nevendra, Meetesh
Singh, Pradeep
AUTOMATED SOFTWARE ENGINEERING, 2025, 32 (01)
[7] AAM-ORB: affine attention module on ORB for conditioned feature matching
Shaojing Song
Luxia Ai
Pan Tang
Zhiqing Miao
Yang Gu
Yu Chai
Signal, Image and Video Processing, 2023, 17 : 2351 - 2358
[8] Single image super-resolution based on trainable feature matching attention network
Chen, Qizhou
Shao, Qing
PATTERN RECOGNITION, 2024, 149
[9] Feature vector field and feature matching
Wu, F. C.
Wang, Z. H.
Wang, X. G.
PATTERN RECOGNITION, 2010, 43 (10) : 3273 - 3281
[10] An improved matching algorithm for feature points matching
Yan Yuanhui
Xia Haiying
Huang Siqi
Xiao Wenjing
2014 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMMUNICATIONS AND COMPUTING (ICSPCC), 2014, : 292 - 296

← 1 2 3 4 5 →