Sparse Transformer-Based Sequence Generation for Visual Object Tracking

被引：0

作者：

Tian, Dan ^{[1
]}

Liu, Dong-Xin ^{[2
]}

Wang, Xiao ^{[2
]}

Hao, Ying ^{[2
]}

机构：

[1] Shenyang Univ, Sch Intelligent Syst Sci & Engn, Shenyang 110044, Liaoning, Peoples R China

[2] Shenyang Univ, Sch Informat Engn, Shenyang 110044, Liaoning, Peoples R China

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Transformers; Visualization; Target tracking; Decoding; Feature extraction; Attention mechanisms; Object tracking; Training; Interference; Attention mechanism; sequence generation; sparse attention; visual object tracking; vision transformer;

D O I：

10.1109/ACCESS.2024.3482468

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In visual object tracking, attention mechanisms can flexibly and efficiently handle complex dependencies and global information, which improves tracking accuracy. However, when dealing with scenarios that contain a large amount of background information or other complex information, its global attention ability can dilute the weight of important information, allocate unnecessary attention to background information, and thus reduce tracking performance. To relieve this problem, this paper proposes a visual object tracking framework based on a sparse transformer. Our tracking framework is a simple encoder-decoder structure that realizes the prediction of the target in an autoregressive manner, eliminating the additional head network and simplifying the tracking architecture. Furthermore, we introduce a Sparse Attention Mechanism (SMA) in the cross-attention layer of the decoder. Unlike traditional attention mechanisms, SMA focuses only on the top K pixel values that are most relevant to the current pixel when calculating attention weights. This allows the model to focus more on key information and improve foreground and background discrimination, resulting in more accurate and robust tracking. We conduct tests on six tracking benchmarks, and the experimental results prove the effectiveness of our method.

引用

页码：154418 / 154425

页数：8

共 50 条

[21] VTST: Efficient Visual Tracking With a Stereoscopic Transformer
Gu, Fengwei
Lu, Jun
Cai, Chengtao
Zhu, Qidan
Ju, Zhaojie
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (03): : 2401 - 2416
[22] Toward a Unified Transformer-Based Framework for Scene Graph Generation and Human-Object Interaction Detection
He, Tao
Gao, Lianli
Song, Jingkuan
Li, Yuan-Fang
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 6274 - 6288
[23] Channel Graph Regularized Correlation Filters for Visual Object Tracking
Jain, Monika
Tyagi, Arjun
Subramanyam, A., V
Denman, Simon
Sridharan, Sridha
Fookes, Clinton
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (02) : 715 - 729
[24] Arabic Paraphrase Generation Using Transformer-Based Approaches
Al-Shameri, Noora Aref
Al-Khalifa, Hend S.
IEEE ACCESS, 2024, 12 : 121896 - 121914
[25] TVENet: Transformer-Based Visual Exploration Network for Mobile Robot in Unseen Environment
Zhang, Tianyao
Hu, Xiaoguang
Xiao, Jin
Zhang, Guofeng
IEEE ACCESS, 2022, 10 : 62056 - 62072
[26] Transformer Sub-Patch Matching for High-Performance Visual Object Tracking
Tang, Chuanming
Hu, Qintao
Zhou, Gaofan
Yao, Jinzhen
Zhang, Jianlin
Huang, Yongmei
Ye, Qixiang
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (08) : 8121 - 8135
[27] Transformer-based visual object tracking via fine-coarse concatenated attention and cross concatenated MLP
Gao, Long
Chen, Langkun
Liu, Pan
Jiang, Yan
Li, Yunsong
Ning, Jifeng
PATTERN RECOGNITION, 2024, 146
[28] OmniTracker: Unifying Visual Object Tracking by Tracking-With-Detection
Wang, Junke
Wu, Zuxuan
Chen, Dongdong
Luo, Chong
Dai, Xiyang
Yuan, Lu
Jiang, Yu-Gang
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (04) : 3159 - 3174
[29] Learning Low-Rank and Sparse Discriminative Correlation Filters for Coarse-to-Fine Visual Object Tracking
Xu, Tianyang
Feng, Zhen-Hua
Wu, Xiao-Jun
Kittler, Josef
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (10) : 3727 - 3739
[30] TM2B: Transformer-Based Motion-to-Box Network for 3D Single Object Tracking on Point Clouds
Xu, Anqi
Nie, Jiahao
He, Zhiwei
Lv, Xudong
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (08): : 7078 - 7085

← 1 2 3 4 5 →