MatchFormer: Interleaving Attention in Transformers for Feature Matching

被引：24

作者：

Wang, Qing ^{[1
]}

Zhang, Jiaming ^{[1
]}

Yang, Kailun ^{[1
]}

Peng, Kunyu ^{[1
]}

Stiefelhagen, Rainer ^{[1
]}

机构：

[1] Karlsruhe Inst Technol, Karlsruhe, Germany

来源：

COMPUTER VISION - ACCV 2022, PT III | 2023年 / 13843卷

关键词：

Feature matching; Vision transformers;

D O I：

10.1007/978-3-031-26313-2_16

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Local feature matching is a computationally intensive task at the subpixel level. While detector-based methods coupled with feature descriptors struggle in low-texture scenes, CNN-based methods with a sequential extract-to-match pipeline, fail to make use of the matching capacity of the encoder and tend to overburden the decoder for matching. In contrast, we propose a novel hierarchical extract-and-match transformer, termed as MatchFormer. Inside each stage of the hierarchical encoder, we interleave self-attention for feature extraction and cross-attention for feature matching, yielding a human-intuitive extract-and-match scheme. Such a match-aware encoder releases the overloaded decoder and makes the model highly efficient. Further, combining self- and cross-attention on multi-scale features in a hierarchical architecture improves matching robustness, particularly in low-texture indoor scenes or with less outdoor training data. Thanks to such a strategy, MatchFormer is a multi-win solution in efficiency, robustness, and precision. Compared to the previous best method in indoor pose estimation, our lite MatchFormer has only 45% GFLOPs, yet achieves a +1.3% precision gain and a 41% running speed boost. The large MatchFormer reaches state-of-the-art on four different benchmarks, including indoor pose estimation (ScanNet), outdoor pose estimation (MegaDepth), homography estimation and image matching (HPatch), and visual localization (InLoc).

引用

页码：256 / 273

页数：18

共 50 条

[31] Exploiting SLAM to Improve Feature Matching
Hajebi, Kiana
Zhang, Hong
[J]. 2011 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2011, : 3107 - 3112
[32] A Novel Feature Matching Method for Matching OpenStreetMap Buildings with Those of Reference Dataset
Moradi, Milad
Roche, Stephane
Mostafavi, Mir Abolfazl
[J]. WEB AND WIRELESS GEOGRAPHICAL INFORMATION SYSTEMS, W2GIS 2023, 2023, 13912 : 139 - 152
[33] Feature matching based on unsupervised manifold alignment
Yan, Weidong
Tian, Zheng
Duan, Xifa
Pan, Lulu
[J]. MACHINE VISION AND APPLICATIONS, 2013, 24 (05) : 983 - 994
[34] Feature matching constrained by cross ratio invariance
Branca, A
Stella, E
Distante, A
[J]. PATTERN RECOGNITION, 2000, 33 (03) : 465 - 481
[35] Robust Feature Matching via Local Consensus
Chen, Jun
Yang, Meng
Peng, Chengli
Luo, Linbo
Gong, Wenping
[J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[36] Based on statistics of the gradients the feature matching algorithm
Guo, Jidong
Li, XueQing
[J]. PROCEEDINGS OF THE FIRST INTERNATIONAL WORKSHOP ON EDUCATION TECHNOLOGY AND COMPUTER SCIENCE, VOL II, 2009, : 983 - 987
[37] User-Assisted Feature Correspondence Matching
Ring, Dan
Kokaram, Anil
[J]. 2009 CONFERENCE FOR VISUAL MEDIA PRODUCTION: CVMP 2009, 2009, : 214 - 219
[38] Reconstructing shredded documents through feature matching
Justino, Edson
Oliveira, Luiz S.
Freitas, Cinthia
[J]. FORENSIC SCIENCE INTERNATIONAL, 2006, 160 (2-3) : 140 - 147
[39] Local Image Feature Matching for Object Recognition
Sushkov, Oleg O.
Sammut, Claude
[J]. 11TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV 2010), 2010, : 1598 - 1604
[40] A probabilistic optimization approach to SAR feature matching
Ettinger, GJ
Klanderman, GA
Wells, WM
Grimson, EL
[J]. ALGORITHMS FOR SYNTHETIC APERTURE RADAR IMAGERY III, 1996, 2757 : 318 - 329

← 1 2 3 4 5 →