Transformer With Linear-Window Attention for Feature Matching

被引:0
|
作者
Shen, Zhiwei [1 ,2 ]
Kong, Bin [1 ,3 ,4 ]
Dong, Xiaoyu [1 ,2 ]
机构
[1] Chinese Acad Sci, Hefei Inst Intelligent Machines, Hefei 230031, Peoples R China
[2] Univ Sci & Technol China, Hefei Inst Phys Sci, Hefei 230026, Peoples R China
[3] Anhui Engn Lab Intelligent Driving Technol & Appli, Hefei 230088, Peoples R China
[4] Chinese Acad Sci, Innovat Res Inst Robot & Intelligent Mfg Hefei, Hefei 230088, Peoples R China
关键词
Feature extraction; Transformers; Task analysis; Computational modeling; Computational efficiency; Memory management; Visualization; Feature matching; visual transformer; detector-free; computational complexity; low-texture;
D O I
10.1109/ACCESS.2023.3328855
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A transformer can capture long-term dependencies through an attention mechanism, and hence, can be applied to various vision tasks. However, its secondary computational complexity is a major obstacle in vision tasks that require accurate predictions. To address this limitation, this study introduces linear-window attention (LWA), a new attention model for a vision transformer. The transformer computes self-attention that is restricted to nonoverlapping local windows and represented as a linear dot product of kernel feature mappings. Furthermore, the computational complexity of each window is reduced to linear from quadratic using the constraint property of matrix products. In addition, we applied the LWA to feature matching to construct a coarse-to-fine-level detector-free feature matching method, called transformer with linear-window attention for feature matching TRLWAM. At the coarse level, we extracted the dense pixel-level matches, and at the fine level, we obtained the final matching results via multi-head multilayer perceptron refinement. We demonstrated the effectiveness of LWA through Replace experiments. The results showed that the TRLWAM could extract dense matches from low-texture or repetitive pattern regions in indoor environments, and exhibited excellent results with a low computational cost for MegaDepth and HPatches datasets. We believe the proposed LWA can provide new conceptions for transformer applications in visual tasks.
引用
收藏
页码:121202 / 121211
页数:10
相关论文
共 50 条
  • [21] A Novel Transformer Network With Shifted Window Cross-Attention for Spatiotemporal Weather Forecasting
    Bojesomo, Alabi
    Almarzouqi, Hasan
    Liatsis, Panos
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 45 - 55
  • [22] FmCFA: a feature matching method for critical feature attention in multimodal images
    Liao, Yun
    Wu, Xuning
    Liu, Junhui
    Liu, Peiyu
    Pan, Zhixuan
    Duan, Qing
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [23] Progressive Feature Matching: Incremental Graph Construction and Optimization
    Lee, Sehyung
    Lim, Jongwoo
    Suh, Il Hong
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 (29) : 6992 - 7005
  • [24] One Model to Rule Them all: A Universal Transformer for Biometric Matching
    Abdrakhmanova, Madina
    Yermekova, Assel
    Barko, Yuliya
    Ryspayev, Vladislav
    Jumadildayev, Medet
    Varol, Huseyin Atakan
    IEEE ACCESS, 2024, 12 : 96729 - 96739
  • [25] DHT: Dynamic Vision Transformer Using Hybrid Window Attention for Industrial Defect Images Classification
    Ding, Chao
    Tang, Donglin
    Zheng, Xianghua
    Wang, Qiang
    He, Yuanyuan
    Long, Zhang
    IEEE INSTRUMENTATION & MEASUREMENT MAGAZINE, 2023, 26 (02) : 19 - 28
  • [26] Embedded Heterogeneous Attention Transformer for Cross-Lingual Image Captioning
    Song, Zijie
    Hu, Zhenzhen
    Zhou, Yuanen
    Zhao, Ye
    Hong, Richang
    Wang, Meng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9008 - 9020
  • [27] Robust Feature Matching via Hierarchical Local Structure Visualization
    Chen, Jiaxuan
    Fan, Xiaoyan
    Chen, Shuang
    Yang, Yang
    Bai, Haicheng
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [28] Pcwin Transformer: Permuted Channel Window based Attention for Image Classification
    Li, Shibao
    Liu, Yixuan
    Wang, Zhaoyu
    Cui, Xuerong
    Zhang, Yunwu
    Jia, Zekun
    Zhu, Jinze
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [29] Enhancing Image Quality by Reducing Compression Artifacts Using Dynamic Window Swin Transformer
    Ma, Zhenchao
    Wang, Yixiao
    Tohidypour, Hamid Reza
    Nasiopoulos, Panos
    Leung, Victor C. M.
    IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2024, 14 (02) : 275 - 285
  • [30] Multi-Granularity Matching Transformer for Text-Based Person Search
    Bao, Liping
    Wei, Longhui
    Zhou, Wengang
    Liu, Lin
    Xie, Lingxi
    Li, Houqiang
    Tian, Qi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4281 - 4293