Transformer With Linear-Window Attention for Feature Matching

被引:0
|
作者
Shen, Zhiwei [1 ,2 ]
Kong, Bin [1 ,3 ,4 ]
Dong, Xiaoyu [1 ,2 ]
机构
[1] Chinese Acad Sci, Hefei Inst Intelligent Machines, Hefei 230031, Peoples R China
[2] Univ Sci & Technol China, Hefei Inst Phys Sci, Hefei 230026, Peoples R China
[3] Anhui Engn Lab Intelligent Driving Technol & Appli, Hefei 230088, Peoples R China
[4] Chinese Acad Sci, Innovat Res Inst Robot & Intelligent Mfg Hefei, Hefei 230088, Peoples R China
关键词
Feature extraction; Transformers; Task analysis; Computational modeling; Computational efficiency; Memory management; Visualization; Feature matching; visual transformer; detector-free; computational complexity; low-texture;
D O I
10.1109/ACCESS.2023.3328855
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A transformer can capture long-term dependencies through an attention mechanism, and hence, can be applied to various vision tasks. However, its secondary computational complexity is a major obstacle in vision tasks that require accurate predictions. To address this limitation, this study introduces linear-window attention (LWA), a new attention model for a vision transformer. The transformer computes self-attention that is restricted to nonoverlapping local windows and represented as a linear dot product of kernel feature mappings. Furthermore, the computational complexity of each window is reduced to linear from quadratic using the constraint property of matrix products. In addition, we applied the LWA to feature matching to construct a coarse-to-fine-level detector-free feature matching method, called transformer with linear-window attention for feature matching TRLWAM. At the coarse level, we extracted the dense pixel-level matches, and at the fine level, we obtained the final matching results via multi-head multilayer perceptron refinement. We demonstrated the effectiveness of LWA through Replace experiments. The results showed that the TRLWAM could extract dense matches from low-texture or repetitive pattern regions in indoor environments, and exhibited excellent results with a low computational cost for MegaDepth and HPatches datasets. We believe the proposed LWA can provide new conceptions for transformer applications in visual tasks.
引用
收藏
页码:121202 / 121211
页数:10
相关论文
共 50 条
  • [41] Local Semantic Feature Aggregation-Based Transformer for Hyperspectral Image Classification
    Tu, Bing
    Liao, Xiaolong
    Li, Qianming
    Peng, Yishu
    Plaza, Antonio
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [42] Screening Method for Feature Matching Based on Dynamic Window Motion Statistics
    Xiang H.
    Zhou L.
    Ba X.
    Chen J.
    Huanan Ligong Daxue Xuebao/Journal of South China University of Technology (Natural Science), 2020, 48 (06): : 114 - 122
  • [43] Multiscale Attention Feature Fusion Based on Improved Transformer for Hyperspectral Image and LiDAR Data Classification
    Wang, Aili
    Lei, Guilong
    Dai, Shiyu
    Wu, Haibin
    Iwahori, Yuji
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 4124 - 4140
  • [44] Feature Detection and Matching With Linear Adjustment and Adaptive Thresholding
    Cai, Zhiming
    Ou, Yiwen
    Ling, Yufeng
    Dong, Jian
    Lu, Jian
    Lee, Howard
    IEEE ACCESS, 2020, 8 : 189735 - 189746
  • [45] CrossFormer plus plus : A Versatile Vision Transformer Hinging on Cross-Scale Attention
    Wang, Wenxiao
    Chen, Wei
    Qiu, Qibo
    Chen, Long
    Wu, Boxi
    Lin, Binbin
    He, Xiaofei
    Liu, Wei
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (05) : 3123 - 3136
  • [46] MR-Transformer: FPGA Accelerated Deep Learning Attention Model for Modulation Recognition
    Wang, Haiyan
    Qi, Zhongzheng
    Li, Zan
    Zhao, Xiaohui
    IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2025, 24 (02) : 1221 - 1233
  • [47] Remote Sensing Image Change Detection Based on Lightweight Transformer and Multiscale Feature Fusion
    Li, Jingming
    Zheng, Panpan
    Wang, Liejun
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 5460 - 5473
  • [48] CGFTrans: Cross-Modal Global Feature Fusion Transformer for Medical Report Generation
    Xu, Liming
    Tang, Quan
    Zheng, Bochuan
    Lv, Jiancheng
    Li, Weisheng
    Zeng, Xianhua
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (09) : 5600 - 5612
  • [49] LSV-ANet: Deep Learning on Local Structure Visualization for Feature Matching
    Chen, Jiaxuan
    Chen, Shuang
    Chen, Xiaoxian
    Yang, Yang
    Xing, Linjie
    Fan, Xiaoyan
    Rao, Yujing
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [50] Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering
    Cao, Jianjian
    Qin, Xiameng
    Zhao, Sanyuan
    Shen, Jianbing
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022,