Sparse Transformer-Based Sequence Generation for Visual Object Tracking

被引：0

作者：

Tian, Dan ^{[1
]}

Liu, Dong-Xin ^{[2
]}

Wang, Xiao ^{[2
]}

Hao, Ying ^{[2
]}

机构：

[1] Shenyang Univ, Sch Intelligent Syst Sci & Engn, Shenyang 110044, Liaoning, Peoples R China

[2] Shenyang Univ, Sch Informat Engn, Shenyang 110044, Liaoning, Peoples R China

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Transformers; Visualization; Target tracking; Decoding; Feature extraction; Attention mechanisms; Object tracking; Training; Interference; Attention mechanism; sequence generation; sparse attention; visual object tracking; vision transformer;

D O I：

10.1109/ACCESS.2024.3482468

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In visual object tracking, attention mechanisms can flexibly and efficiently handle complex dependencies and global information, which improves tracking accuracy. However, when dealing with scenarios that contain a large amount of background information or other complex information, its global attention ability can dilute the weight of important information, allocate unnecessary attention to background information, and thus reduce tracking performance. To relieve this problem, this paper proposes a visual object tracking framework based on a sparse transformer. Our tracking framework is a simple encoder-decoder structure that realizes the prediction of the target in an autoregressive manner, eliminating the additional head network and simplifying the tracking architecture. Furthermore, we introduce a Sparse Attention Mechanism (SMA) in the cross-attention layer of the decoder. Unlike traditional attention mechanisms, SMA focuses only on the top K pixel values that are most relevant to the current pixel when calculating attention weights. This allows the model to focus more on key information and improve foreground and background discrimination, resulting in more accurate and robust tracking. We conduct tests on six tracking benchmarks, and the experimental results prove the effectiveness of our method.

引用

页码：154418 / 154425

页数：8

共 37 条

[1] Learning Discriminative Model Prediction for Tracking [J].

Bhat, Goutam ;

Danelljan, Martin ;

Van Gool, Luc ;

Timofte, Radu .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6181-6190

[2] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[3]

Cen MB, 2018, IEEE IMAGE PROC, P3718, DOI 10.1109/ICIP.2018.8451102

[4] Backbone is All Your Need: A Simplified Architecture for Visual Object Tracking [J].

Chen, Boyu ;

Li, Peixia ;

Bai, Lei ;

Qiao, Lei ;

Shen, Qiuhong ;

Li, Bo ;

Gan, Weihao ;

Wu, Wei ;

Ouyang, Wanli .

COMPUTER VISION, ECCV 2022, PT XXII, 2022, 13682 :375-392

[5] SeqTrack: Sequence to Sequence Learning for Visual Object Tracking [J].

Chen, Xin ;

Peng, Houwen ;

Wang, Dong ;

Lu, Huchuan ;

Hu, Han .

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :14572-14581

[6] Transformer Tracking [J].

Chen, Xin ;

Yan, Bin ;

Zhu, Jiawen ;

Wang, Dong ;

Yang, Xiaoyun ;

Lu, Huchuan .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :8122-8131

[7] MixFormer: End-to-End Tracking with Iterative Mixed Attention [J].

Cui, Yutao ;

Jiang, Cheng ;

Wang, Limin ;

Wu, Gangshan .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :13598-13608

[8] Probabilistic Regression for Visual Tracking [J].

Danelljan, Martin ;

Van Gool, Luc ;

Timofte, Radu .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :7181-7190

[9] ATOM: Accurate Tracking by Overlap Maximization [J].

Danelljan, Martin ;

Bhat, Goutam ;

Khan, Fahad Shahbaz ;

Felsberg, Michael .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4655-4664

[10] ECO: Efficient Convolution Operators for Tracking [J].

Danelljan, Martin ;

Bhat, Goutam ;

Khan, Fahad Shahbaz ;

Felsberg, Michael .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6931-6939

← 1 2 3 4 →