Transformer With Linear-Window Attention for Feature Matching

被引:0
|
作者
Shen, Zhiwei [1 ,2 ]
Kong, Bin [1 ,3 ,4 ]
Dong, Xiaoyu [1 ,2 ]
机构
[1] Chinese Acad Sci, Hefei Inst Intelligent Machines, Hefei 230031, Peoples R China
[2] Univ Sci & Technol China, Hefei Inst Phys Sci, Hefei 230026, Peoples R China
[3] Anhui Engn Lab Intelligent Driving Technol & Appli, Hefei 230088, Peoples R China
[4] Chinese Acad Sci, Innovat Res Inst Robot & Intelligent Mfg Hefei, Hefei 230088, Peoples R China
关键词
Feature extraction; Transformers; Task analysis; Computational modeling; Computational efficiency; Memory management; Visualization; Feature matching; visual transformer; detector-free; computational complexity; low-texture;
D O I
10.1109/ACCESS.2023.3328855
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A transformer can capture long-term dependencies through an attention mechanism, and hence, can be applied to various vision tasks. However, its secondary computational complexity is a major obstacle in vision tasks that require accurate predictions. To address this limitation, this study introduces linear-window attention (LWA), a new attention model for a vision transformer. The transformer computes self-attention that is restricted to nonoverlapping local windows and represented as a linear dot product of kernel feature mappings. Furthermore, the computational complexity of each window is reduced to linear from quadratic using the constraint property of matrix products. In addition, we applied the LWA to feature matching to construct a coarse-to-fine-level detector-free feature matching method, called transformer with linear-window attention for feature matching TRLWAM. At the coarse level, we extracted the dense pixel-level matches, and at the fine level, we obtained the final matching results via multi-head multilayer perceptron refinement. We demonstrated the effectiveness of LWA through Replace experiments. The results showed that the TRLWAM could extract dense matches from low-texture or repetitive pattern regions in indoor environments, and exhibited excellent results with a low computational cost for MegaDepth and HPatches datasets. We believe the proposed LWA can provide new conceptions for transformer applications in visual tasks.
引用
收藏
页码:121202 / 121211
页数:10
相关论文
共 50 条
  • [31] Transformer Driven Matching Selection Mechanism for Multi-Label Image Classification
    Wu, Yanan
    Feng, Songhe
    Zhao, Gongpei
    Jin, Yi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (02) : 924 - 937
  • [32] MR-Matcher: A Multirouting Transformer-Based Network for Accurate Local Feature Matching
    Jiang, Zhiqiang
    Wang, Ke
    Kong, Qingjia
    Dai, Kun
    Xie, Tao
    Qin, Zhonghao
    Li, Ruifeng
    Perner, Petra
    Zhao, Lijun
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73
  • [33] Matching Multi-Scale Feature Sets in Vision Transformer for Few-Shot Classification
    Song, Mingchen
    Yao, Fengqin
    Zhong, Guoqiang
    Ji, Zhong
    Zhang, Xiaowei
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (12) : 12638 - 12651
  • [34] Positional Attention Guided Transformer-Like Architecture for Visual Question Answering
    Mao, Aihua
    Yang, Zhi
    Lin, Ken
    Xuan, Jun
    Liu, Yong-Jin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 6997 - 7009
  • [35] Video Sparse Transformer With Attention-Guided Memory for Video Object Detection
    Fujitake, Masato
    Sugimoto, Akihiro
    IEEE ACCESS, 2022, 10 : 65886 - 65900
  • [36] EViT: An Eagle Vision Transformer With Bi-Fovea Self-Attention
    Shi, Yulong
    Sun, Mingwei
    Wang, Yongshuai
    Ma, Jiahao
    Chen, Zengqiang
    IEEE TRANSACTIONS ON CYBERNETICS, 2025, 55 (03) : 1288 - 1300
  • [37] Light Self-Gaussian-Attention Vision Transformer for Hyperspectral Image Classification
    Ma, Chao
    Wan, Minjie
    Wu, Jian
    Kong, Xiaofang
    Shao, Ajun
    Wang, Fan
    Chen, Qian
    Gu, Guohua
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
  • [38] GI-Transformer: A Transformer Model With Feature Interaction for Nonintrusive Load Monitoring
    Liu, Xiaoyang
    Jiang, Tao
    Luo, Haiyue
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2025, 74
  • [39] Conditional Feature Learning Based Transformer for Text-Based Person Search
    Gao, Chenyang
    Cai, Guanyu
    Jiang, Xinyang
    Zheng, Feng
    Zhang, Jun
    Gong, Yifei
    Lin, Fangzhou
    Sun, Xing
    Bai, Xiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 6097 - 6108
  • [40] Transformer-Based Band Regrouping With Feature Refinement for Hyperspectral Object Tracking
    Wang, Hanzheng
    Li, Wei
    Xia, Xiang-Gen
    Du, Qian
    Tian, Jing
    Shen, Qing
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62