Sparse Transformer-Based Sequence Generation for Visual Object Tracking

被引:0
|
作者
Tian, Dan [1 ]
Liu, Dong-Xin [2 ]
Wang, Xiao [2 ]
Hao, Ying [2 ]
机构
[1] Shenyang Univ, Sch Intelligent Syst Sci & Engn, Shenyang 110044, Liaoning, Peoples R China
[2] Shenyang Univ, Sch Informat Engn, Shenyang 110044, Liaoning, Peoples R China
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Transformers; Visualization; Target tracking; Decoding; Feature extraction; Attention mechanisms; Object tracking; Training; Interference; Attention mechanism; sequence generation; sparse attention; visual object tracking; vision transformer;
D O I
10.1109/ACCESS.2024.3482468
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In visual object tracking, attention mechanisms can flexibly and efficiently handle complex dependencies and global information, which improves tracking accuracy. However, when dealing with scenarios that contain a large amount of background information or other complex information, its global attention ability can dilute the weight of important information, allocate unnecessary attention to background information, and thus reduce tracking performance. To relieve this problem, this paper proposes a visual object tracking framework based on a sparse transformer. Our tracking framework is a simple encoder-decoder structure that realizes the prediction of the target in an autoregressive manner, eliminating the additional head network and simplifying the tracking architecture. Furthermore, we introduce a Sparse Attention Mechanism (SMA) in the cross-attention layer of the decoder. Unlike traditional attention mechanisms, SMA focuses only on the top K pixel values that are most relevant to the current pixel when calculating attention weights. This allows the model to focus more on key information and improve foreground and background discrimination, resulting in more accurate and robust tracking. We conduct tests on six tracking benchmarks, and the experimental results prove the effectiveness of our method.
引用
收藏
页码:154418 / 154425
页数:8
相关论文
共 50 条
  • [31] Visual Object Tracking With Discriminative Filters and Siamese Networks: A Survey and Outlook
    Javed, Sajid
    Danelljan, Martin
    Khan, Fahad Shahbaz
    Khan, Muhammad Haris
    Felsberg, Michael
    Matas, Jiri
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (05) : 6552 - 6574
  • [32] Feature Aggregation Networks Based on Dual Attention Capsules for Visual Object Tracking
    Cao, Yi
    Ji, Hongbing
    Zhang, Wenbo
    Shirani, Shahram
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (02) : 674 - 689
  • [33] Complementary Discriminative Correlation Filters Based on Collaborative Representation for Visual Object Tracking
    Zhu, Xue-Feng
    Wu, Xiao-Jun
    Xu, Tianyang
    Feng, Zhen-Hua
    Kittler, Josef
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (02) : 557 - 568
  • [34] Reliable object tracking by multimodal hybrid feature extraction and transformer-based fusion
    Sun, Hongze
    Liu, Rui
    Cai, Wuque
    Wang, Jun
    Wang, Yue
    Tang, Huajin
    Cui, Yan
    Yao, Dezhong
    Guo, Daqing
    NEURAL NETWORKS, 2024, 178
  • [35] Adaptive sparse attention-based compact transformer for object tracking
    Pan, Fei
    Zhao, Lianyu
    Wang, Chenglin
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [36] Transformer-Based Maneuvering Target Tracking
    Zhao, Guanghui
    Wang, Zelin
    Huang, Yixiong
    Zhang, Huirong
    Ma, Xiaojing
    SENSORS, 2022, 22 (21)
  • [37] Visual Object Tracking by Hierarchical Attention Siamese Network
    Shen, Jianbing
    Tang, Xin
    Dong, Xingping
    Shao, Ling
    IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (07) : 3068 - 3080
  • [38] RPformer: A Robust Parallel Transformer for Visual Tracking in Complex Scenes
    Gu, Fengwei
    Lu, Jun
    Cai, Chengtao
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71
  • [39] TLSH-MOT: Drone-View Video Multiple Object Tracking via Transformer-Based Locally Sensitive Hash
    Yuan, Yubin
    Wu, Yiquan
    Zhao, Langyue
    Liu, Yuqi
    Pang, Yaxuan
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
  • [40] Visual Object Tracking Based on Mutual Learning Between Cohort Multiscale Feature-Fusion Networks With Weighted Loss
    Fang, Jiaojiao
    Liu, Guizhong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (03) : 1055 - 1065