Joint Spatio-Temporal Similarity and Discrimination Learning for Visual Tracking

被引:0
|
作者
Liang, Yanjie [1 ]
Chen, Haosheng [2 ]
Wu, Qiangqiang [3 ]
Xia, Changqun [1 ]
Li, Jia [4 ]
机构
[1] Peng Cheng Lab, Shenzhen 518000, Peoples R China
[2] Chongqing Univ Posts & Telecommun, Coll Comp Sci & Technol, Chongqing Key Lab Image Cognit, Chongqing 400065, Peoples R China
[3] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
[4] Beihang Univ, Sch Comp Sci & Engn, State Key Lab Virtual Real Technol & Syst, Beijing 100191, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Target tracking; Location awareness; Correlation; Visualization; Learning systems; Circuits and systems; Transformers; Video object tracking; joint learning; spatio-temporal similarity; spatio-temporal discrimination; adaptive response map fusion;
D O I
10.1109/TCSVT.2024.3377379
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Visual tracking is a task of localizing a target unceasingly in a video with an initial target state at the first frame. The limited target information makes this problem an extremely challenging task. Existing tracking methods either perform matching based similarity learning or optimization based discrimination reasoning. However, these two types of tracking methods suffer from the problem of ineffectiveness for distinguishing target objects from background distractors and the problem of insufficiency in maintaining spatio-temporal consistency among successive frames, respectively. In this paper, we design a joint spatio-temporal similarity and discrimination learning (STSDL) framework for accurate and robust tracking. The designed framework is composed of two complementary branches: a similarity learning branch and a discrimination learning branch. The similarity learning branch uses an effective transformer encoder-decoder to gather rich spatio-temporal context information to generate a similarity map. In parallel, the discrimination learning branch exploits an efficient model predictor to train a target model to produce a discriminative map. Finally, the similarity map and the discriminative map are adaptively fused for accurate and robust target localization. Experimental results on six prevalent datasets demonstrate that the proposed STSDL can obtain satisfactory results, while it retains a real-time tracking speed of 50 FPS on a single GPU.
引用
收藏
页码:7284 / 7300
页数:17
相关论文
共 50 条
  • [41] Robust Visual Tracking with Dual Spatio-Temporal Context Trackers
    Sun, Shiyan
    Zhang, Hong
    Yuan, Ding
    SEVENTH INTERNATIONAL CONFERENCE ON GRAPHIC AND IMAGE PROCESSING (ICGIP 2015), 2015, 9817
  • [42] Spatio-Temporal Saliency for Action Similarity
    Burghouts, G. J.
    van den Broek, S. P.
    ten Hove, R. J. M.
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2013, : 257 - 262
  • [43] Hypothesis Testing Based Tracking With Spatio-Temporal Joint Interaction Modeling
    Sheng, Hao
    Zhang, Yang
    Wu, Yubin
    Wang, Shuai
    Lyu, Weifeng
    Ke, Wei
    Xiong, Zhang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (09) : 2971 - 2983
  • [44] Learning Spatio-Temporal Information for Multi-Object Tracking
    Wei, Jian
    Yang, Mei
    Liu, Feng
    IEEE ACCESS, 2017, 5 : 3869 - 3877
  • [45] ROBUST TRACKING VIA WEIGHTED SPATIO-TEMPORAL CONTEXT LEARNING
    Xu, Jianqiang
    Lu, Yao
    Liu, Jinwu
    2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 413 - 416
  • [46] Adaptive background learning for vehicle detection and spatio-temporal tracking
    Zhang, CC
    Chen, SC
    Shyu, ML
    Peeta, S
    ICICS-PCM 2003, VOLS 1-3, PROCEEDINGS, 2003, : 797 - 801
  • [47] Aberrance suppressed spatio-temporal correlation filters for visual object tracking
    Elayaperumal, Dinesh
    Joo, Young Hoon
    PATTERN RECOGNITION, 2021, 115
  • [48] Robust Online Learned Spatio-Temporal Context Model for Visual Tracking
    Wen, Longyin
    Cai, Zhaowei
    Lei, Zhen
    Yi, Dong
    Li, Stan Z.
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2014, 23 (02) : 785 - 796
  • [49] DASTSiam: Spatio-temporal fusion and discriminative enhancement for Siamese visual tracking
    Huang, Yucheng
    Firkat, Eksan
    Zhang, Jinlai
    Zhu, Lijuan
    Zhu, Bin
    Zhu, Jihong
    Hamdulla, Askar
    IET COMPUTER VISION, 2023, 17 (08) : 1017 - 1033
  • [50] Spatio-temporal interactive fusion based visual object tracking method
    Huang, Dandan
    Yu, Siyu
    Duan, Jin
    Wang, Yingzhi
    Yao, Anni
    Wang, Yiwen
    Xi, Junhan
    FRONTIERS IN PHYSICS, 2023, 11