Query-Based Object Visual Tracking with Parallel Sequence Generation

被引：0

作者：

Liu, Chang ^{[1
]}

Zhang, Bin ^{[1
]}

Bo, Chunjuan ^{[2
]}

Wang, Dong ^{[1
]}

机构：

[1] Dalian Univ Technol, Sch Informat & Commun Engn, Dalian 116024, Peoples R China

[2] Dalian Minzu Univ, Sch Informat & Commun Engn, Dalian 116600, Peoples R China

来源：

SENSORS | 2024年 / 24卷 / 15期

基金：

中国国家自然科学基金;

关键词：

visual tracking; object tracking; transformer;

D O I：

10.3390/s24154802

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

Query decoders have been shown to achieve good performance in object detection. However, they suffer from insufficient object tracking performance. Sequence-to-sequence learning in this context has recently been explored, with the idea of describing a target as a sequence of discrete tokens. In this study, we experimentally determine that, with appropriate representation, a parallel approach for predicting a target coordinate sequence with a query decoder can achieve good performance and speed. We propose a concise query-based tracking framework for predicting a target coordinate sequence in a parallel manner, named QPSTrack. A set of queries are designed to be responsible for different coordinates of the tracked target. All the queries jointly represent a target rather than a traditional one-to-one matching pattern between the query and target. Moreover, we adopt an adaptive decoding scheme including a one-layer adaptive decoder and learnable adaptive inputs for the decoder. This decoding scheme assists the queries in decoding the template-guided search features better. Furthermore, we explore the use of the plain ViT-Base, ViT-Large, and lightweight hierarchical LeViT architectures as the encoder backbone, providing a family of three variants in total. All the trackers are found to obtain a good trade-off between speed and performance; for instance, our tracker QPSTrack-B256 with the ViT-Base encoder achieves a 69.1% AUC on the LaSOT benchmark at 104.8 FPS.

引用

页数：16

共 48 条

[1] Fully-Convolutional Siamese Networks for Object Tracking
Bertinetto, Luca
Valmadre, Jack
Henriques, Joao F.
Vedaldi, Andrea
Torr, Philip H. S.
[J]. COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 : 850 - 865
[2] Learning Discriminative Model Prediction for Tracking
Bhat, Goutam
Danelljan, Martin
Van Gool, Luc
Timofte, Radu
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6181 - 6190
[3] Efficient Visual Tracking with Exemplar Transformers
Blatter, Philippe
Kanakis, Menelaos
Danelljan, Martin
Van Gool, Luc
[J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 1571 - 1581
[4] FEAR: Fast, Efficient, Accurate and Robust Visual Tracker
Borsuk, Vasyl
Vei, Roman
Kupyn, Orest
Martyniuk, Tetiana
Krashenyi, Igor
Matas, Jiri
[J]. COMPUTER VISION, ECCV 2022, PT XXII, 2022, 13682 : 644 - 663
[5] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
[6] Backbone is All Your Need: A Simplified Architecture for Visual Object Tracking
Chen, Boyu
Li, Peixia
Bai, Lei
Qiao, Lei
Shen, Qiuhong
Li, Bo
Gan, Weihao
Wu, Wei
Ouyang, Wanli
[J]. COMPUTER VISION, ECCV 2022, PT XXII, 2022, 13682 : 375 - 392
[7] Chen T., 2022, P ICLR
[8] Chen X, 2022, Arxiv, DOI arXiv:2203.13537
[9] Transformer Tracking
Chen, Xin
Yan, Bin
Zhu, Jiawen
Wang, Dong
Yang, Xiaoyun
Lu, Huchuan
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 8122 - 8131
[10] MixFormer: End-to-End Tracking with Iterative Mixed Attention
Cui, Yutao
Jiang, Cheng
Wang, Limin
Wu, Gangshan
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 13598 - 13608

← 1 2 3 4 5 →