Sparse Transformer-Based Sequence Generation for Visual Object Tracking

被引:0
|
作者
Tian, Dan [1 ]
Liu, Dong-Xin [2 ]
Wang, Xiao [2 ]
Hao, Ying [2 ]
机构
[1] Shenyang Univ, Sch Intelligent Syst Sci & Engn, Shenyang 110044, Liaoning, Peoples R China
[2] Shenyang Univ, Sch Informat Engn, Shenyang 110044, Liaoning, Peoples R China
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Transformers; Visualization; Target tracking; Decoding; Feature extraction; Attention mechanisms; Object tracking; Training; Interference; Attention mechanism; sequence generation; sparse attention; visual object tracking; vision transformer;
D O I
10.1109/ACCESS.2024.3482468
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In visual object tracking, attention mechanisms can flexibly and efficiently handle complex dependencies and global information, which improves tracking accuracy. However, when dealing with scenarios that contain a large amount of background information or other complex information, its global attention ability can dilute the weight of important information, allocate unnecessary attention to background information, and thus reduce tracking performance. To relieve this problem, this paper proposes a visual object tracking framework based on a sparse transformer. Our tracking framework is a simple encoder-decoder structure that realizes the prediction of the target in an autoregressive manner, eliminating the additional head network and simplifying the tracking architecture. Furthermore, we introduce a Sparse Attention Mechanism (SMA) in the cross-attention layer of the decoder. Unlike traditional attention mechanisms, SMA focuses only on the top K pixel values that are most relevant to the current pixel when calculating attention weights. This allows the model to focus more on key information and improve foreground and background discrimination, resulting in more accurate and robust tracking. We conduct tests on six tracking benchmarks, and the experimental results prove the effectiveness of our method.
引用
收藏
页码:154418 / 154425
页数:8
相关论文
共 50 条
  • [21] VTST: Efficient Visual Tracking With a Stereoscopic Transformer
    Gu, Fengwei
    Lu, Jun
    Cai, Chengtao
    Zhu, Qidan
    Ju, Zhaojie
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (03): : 2401 - 2416
  • [22] Toward a Unified Transformer-Based Framework for Scene Graph Generation and Human-Object Interaction Detection
    He, Tao
    Gao, Lianli
    Song, Jingkuan
    Li, Yuan-Fang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 6274 - 6288
  • [23] Channel Graph Regularized Correlation Filters for Visual Object Tracking
    Jain, Monika
    Tyagi, Arjun
    Subramanyam, A., V
    Denman, Simon
    Sridharan, Sridha
    Fookes, Clinton
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (02) : 715 - 729
  • [24] Arabic Paraphrase Generation Using Transformer-Based Approaches
    Al-Shameri, Noora Aref
    Al-Khalifa, Hend S.
    IEEE ACCESS, 2024, 12 : 121896 - 121914
  • [25] TVENet: Transformer-Based Visual Exploration Network for Mobile Robot in Unseen Environment
    Zhang, Tianyao
    Hu, Xiaoguang
    Xiao, Jin
    Zhang, Guofeng
    IEEE ACCESS, 2022, 10 : 62056 - 62072
  • [26] Transformer Sub-Patch Matching for High-Performance Visual Object Tracking
    Tang, Chuanming
    Hu, Qintao
    Zhou, Gaofan
    Yao, Jinzhen
    Zhang, Jianlin
    Huang, Yongmei
    Ye, Qixiang
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (08) : 8121 - 8135
  • [27] Transformer-based visual object tracking via fine-coarse concatenated attention and cross concatenated MLP
    Gao, Long
    Chen, Langkun
    Liu, Pan
    Jiang, Yan
    Li, Yunsong
    Ning, Jifeng
    PATTERN RECOGNITION, 2024, 146
  • [28] OmniTracker: Unifying Visual Object Tracking by Tracking-With-Detection
    Wang, Junke
    Wu, Zuxuan
    Chen, Dongdong
    Luo, Chong
    Dai, Xiyang
    Yuan, Lu
    Jiang, Yu-Gang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (04) : 3159 - 3174
  • [29] Learning Low-Rank and Sparse Discriminative Correlation Filters for Coarse-to-Fine Visual Object Tracking
    Xu, Tianyang
    Feng, Zhen-Hua
    Wu, Xiao-Jun
    Kittler, Josef
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (10) : 3727 - 3739
  • [30] TM2B: Transformer-Based Motion-to-Box Network for 3D Single Object Tracking on Point Clouds
    Xu, Anqi
    Nie, Jiahao
    He, Zhiwei
    Lv, Xudong
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (08): : 7078 - 7085