Joint spatio-temporal modeling for visual tracking

被引:5
作者
Sun, Yumei [1 ,2 ,3 ,4 ,5 ]
Tang, Chuanming [1 ,2 ,3 ,4 ,5 ]
Luo, Hui [1 ,2 ,3 ,4 ,5 ]
Li, Qingqing [1 ,2 ,3 ,5 ]
Peng, Xiaoming [5 ]
Zhang, Jianlin [1 ,2 ,3 ,4 ,5 ]
Li, Meihui [1 ,2 ,3 ,5 ]
Wei, Yuxing [1 ,2 ,3 ,5 ]
机构
[1] Univ Chinese Acad Sci, Sch Elect Elect & Commun Engn, Beijing 108408, Peoples R China
[2] Chinese Acad Sci, Key Lab Opt Engn, Chengdu 610209, Peoples R China
[3] Chinese Acad Sci, Inst Opt & Elect, Chengdu 610209, Peoples R China
[4] Chinese Acad Sci, Natl Key Lab Opt Field Manipulat Sci & Technol, Chengdu 610209, Peoples R China
[5] Univ Elect Sci & Technol China, Sch Automat Engn, Chengdu 611731, Peoples R China
关键词
Visual tracking; Siamese trackers; Sequence prediction; Spatio-temporal model;
D O I
10.1016/j.knosys.2023.111206
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Similarity-based approaches have made significant progress in visual object tracking (VOT). Although these methods work well in simple scenes, they ignore the continuous spatio-temporal connection of the object in the video sequence. For this reason, tracking by spatial matching solely can lead to tracking failures because of distractors and occlusion. In this paper, we propose a spatio-temporal joint-modeling tracker named STTrack which implicitly builds continuous connections between the temporal and spatial aspects of the sequence. Specifically, we first design a time-sequence iteration strategy (TSIS) to concentrate on the temporal connection of the object in the video sequence. Then, we propose a novel spatial temporal interaction Transformer network (STIN) to capture the spatio-temporal correlation of the object between frames. The proposed STIN module is robust in object occlusion because it explores the dynamic state change dependencies of the object. Finally, we introduce a spatio-temporal query to suppress distractors by iteratively propagating the target prior. Extensive experiments on six tracking benchmark datasets demonstrate that the proposed STTrack achieves excellent performance while operating in real-time. The code is publicly available at https://github.com/nubsym/STTrack.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Fully-Convolutional Siamese Networks for Object Tracking
    Bertinetto, Luca
    Valmadre, Jack
    Henriques, Joao F.
    Vedaldi, Andrea
    Torr, Philip H. S.
    [J]. COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 : 850 - 865
  • [2] Bhat Goutam, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12368), P205, DOI 10.1007/978-3-030-58592-1_13
  • [3] Bingyan Liao, 2020, Computer Vision - ECCV 2020 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12367), P429, DOI 10.1007/978-3-030-58542-6_26
  • [4] Transformer Tracking
    Chen, Xin
    Yan, Bin
    Zhu, Jiawen
    Wang, Dong
    Yang, Xiaoyun
    Lu, Huchuan
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 8122 - 8131
  • [5] Chen Xu, 2022, IEEE T PATTERN ANAL
  • [6] High-Performance Long-Term Tracking with Meta-Updater
    Dai, Kenan
    Zhang, Yunhua
    Wang, Dong
    Li, Jianhua
    Lu, Huchuan
    Yang, Xiaoyun
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 6297 - 6306
  • [7] Probabilistic Regression for Visual Tracking
    Danelljan, Martin
    Van Gool, Luc
    Timofte, Radu
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 7181 - 7190
  • [8] ECO: Efficient Convolution Operators for Tracking
    Danelljan, Martin
    Bhat, Goutam
    Khan, Fahad Shahbaz
    Felsberg, Michael
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6931 - 6939
  • [9] Correlation-Guided Attention for Corner Detection Based Visual Tracking
    Du, Fei
    Liu, Peng
    Zhao, Wei
    Tang, Xianglong
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 6835 - 6844
  • [10] Learning spatial variance-key surrounding-aware tracking via multi-expert deep feature fusion
    Elayaperumal, Dinesh
    Joo, Young Hoon
    [J]. INFORMATION SCIENCES, 2023, 629 : 502 - 519