Query-Based Object Visual Tracking with Parallel Sequence Generation

被引:0
作者
Liu, Chang [1 ]
Zhang, Bin [1 ]
Bo, Chunjuan [2 ]
Wang, Dong [1 ]
机构
[1] Dalian Univ Technol, Sch Informat & Commun Engn, Dalian 116024, Peoples R China
[2] Dalian Minzu Univ, Sch Informat & Commun Engn, Dalian 116600, Peoples R China
基金
中国国家自然科学基金;
关键词
visual tracking; object tracking; transformer;
D O I
10.3390/s24154802
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Query decoders have been shown to achieve good performance in object detection. However, they suffer from insufficient object tracking performance. Sequence-to-sequence learning in this context has recently been explored, with the idea of describing a target as a sequence of discrete tokens. In this study, we experimentally determine that, with appropriate representation, a parallel approach for predicting a target coordinate sequence with a query decoder can achieve good performance and speed. We propose a concise query-based tracking framework for predicting a target coordinate sequence in a parallel manner, named QPSTrack. A set of queries are designed to be responsible for different coordinates of the tracked target. All the queries jointly represent a target rather than a traditional one-to-one matching pattern between the query and target. Moreover, we adopt an adaptive decoding scheme including a one-layer adaptive decoder and learnable adaptive inputs for the decoder. This decoding scheme assists the queries in decoding the template-guided search features better. Furthermore, we explore the use of the plain ViT-Base, ViT-Large, and lightweight hierarchical LeViT architectures as the encoder backbone, providing a family of three variants in total. All the trackers are found to obtain a good trade-off between speed and performance; for instance, our tracker QPSTrack-B256 with the ViT-Base encoder achieves a 69.1% AUC on the LaSOT benchmark at 104.8 FPS.
引用
收藏
页数:16
相关论文
共 48 条
  • [1] Fully-Convolutional Siamese Networks for Object Tracking
    Bertinetto, Luca
    Valmadre, Jack
    Henriques, Joao F.
    Vedaldi, Andrea
    Torr, Philip H. S.
    [J]. COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 : 850 - 865
  • [2] Learning Discriminative Model Prediction for Tracking
    Bhat, Goutam
    Danelljan, Martin
    Van Gool, Luc
    Timofte, Radu
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6181 - 6190
  • [3] Efficient Visual Tracking with Exemplar Transformers
    Blatter, Philippe
    Kanakis, Menelaos
    Danelljan, Martin
    Van Gool, Luc
    [J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 1571 - 1581
  • [4] FEAR: Fast, Efficient, Accurate and Robust Visual Tracker
    Borsuk, Vasyl
    Vei, Roman
    Kupyn, Orest
    Martyniuk, Tetiana
    Krashenyi, Igor
    Matas, Jiri
    [J]. COMPUTER VISION, ECCV 2022, PT XXII, 2022, 13682 : 644 - 663
  • [5] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
  • [6] Backbone is All Your Need: A Simplified Architecture for Visual Object Tracking
    Chen, Boyu
    Li, Peixia
    Bai, Lei
    Qiao, Lei
    Shen, Qiuhong
    Li, Bo
    Gan, Weihao
    Wu, Wei
    Ouyang, Wanli
    [J]. COMPUTER VISION, ECCV 2022, PT XXII, 2022, 13682 : 375 - 392
  • [7] Chen T., 2022, P ICLR
  • [8] Chen X, 2022, Arxiv, DOI arXiv:2203.13537
  • [9] Transformer Tracking
    Chen, Xin
    Yan, Bin
    Zhu, Jiawen
    Wang, Dong
    Yang, Xiaoyun
    Lu, Huchuan
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 8122 - 8131
  • [10] MixFormer: End-to-End Tracking with Iterative Mixed Attention
    Cui, Yutao
    Jiang, Cheng
    Wang, Limin
    Wu, Gangshan
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 13598 - 13608