Joint Spatio-Temporal Similarity and Discrimination Learning for Visual Tracking

被引:0
作者
Liang, Yanjie [1 ]
Chen, Haosheng [2 ]
Wu, Qiangqiang [3 ]
Xia, Changqun [1 ]
Li, Jia [4 ]
机构
[1] Peng Cheng Lab, Shenzhen 518000, Peoples R China
[2] Chongqing Univ Posts & Telecommun, Coll Comp Sci & Technol, Chongqing Key Lab Image Cognit, Chongqing 400065, Peoples R China
[3] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
[4] Beihang Univ, Sch Comp Sci & Engn, State Key Lab Virtual Real Technol & Syst, Beijing 100191, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Target tracking; Location awareness; Correlation; Visualization; Learning systems; Circuits and systems; Transformers; Video object tracking; joint learning; spatio-temporal similarity; spatio-temporal discrimination; adaptive response map fusion;
D O I
10.1109/TCSVT.2024.3377379
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Visual tracking is a task of localizing a target unceasingly in a video with an initial target state at the first frame. The limited target information makes this problem an extremely challenging task. Existing tracking methods either perform matching based similarity learning or optimization based discrimination reasoning. However, these two types of tracking methods suffer from the problem of ineffectiveness for distinguishing target objects from background distractors and the problem of insufficiency in maintaining spatio-temporal consistency among successive frames, respectively. In this paper, we design a joint spatio-temporal similarity and discrimination learning (STSDL) framework for accurate and robust tracking. The designed framework is composed of two complementary branches: a similarity learning branch and a discrimination learning branch. The similarity learning branch uses an effective transformer encoder-decoder to gather rich spatio-temporal context information to generate a similarity map. In parallel, the discrimination learning branch exploits an efficient model predictor to train a target model to produce a discriminative map. Finally, the similarity map and the discriminative map are adaptively fused for accurate and robust target localization. Experimental results on six prevalent datasets demonstrate that the proposed STSDL can obtain satisfactory results, while it retains a real-time tracking speed of 50 FPS on a single GPU.
引用
收藏
页码:7284 / 7300
页数:17
相关论文
共 50 条
  • [41] Transformer-Based Multi-Target Object Detection and Tracking Framework for Robust Spatio-Temporal Memory in Dynamic Environments
    Alzubi, Tareq Mahmod
    Mukhtar, Umar Raza
    IEEE ACCESS, 2025, 13 : 47146 - 47164
  • [42] Spatio-Temporal Enhanced Contrastive and Contextual Learning for Weather Forecasting
    Gong, Yongshun
    He, Tiantian
    Chen, Meng
    Wang, Bin
    Nie, Liqiang
    Yin, Yilong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (08) : 4260 - 4274
  • [43] Similarity- and Quality-Guided Relation Learning for Joint Detection and Tracking
    Feng, Weitao
    Bai, Lei
    Yao, Yongqiang
    Gan, Weihao
    Wu, Wei
    Ouyang, Wanli
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 1267 - 1280
  • [44] Measuring Trajectory Similarity Based on the Spatio-Temporal Properties of Moving Objects in Road Networks
    Dorosti, Ali
    Alesheikh, Ali Asghar
    Sharif, Mohammad
    INFORMATION, 2024, 15 (01)
  • [45] STAT: Multi-Object Tracking Based on Spatio-Temporal Topological Constraints
    Zhang, Junjie
    Wang, Mingyan
    Jiang, Haoran
    Zhang, Xinyu
    Yan, Chenggang
    Zeng, Dan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4445 - 4457
  • [46] An improved target tracking algorithm based on spatio-temporal context under occlusions
    Yang, Xin
    Zhu, Songyan
    Zhou, Dake
    Zhang, Yifan
    MULTIDIMENSIONAL SYSTEMS AND SIGNAL PROCESSING, 2020, 31 (01) : 329 - 344
  • [47] Spatio-temporal Similarity based Privacy-preserving Worker Selection in Mobile Crowdsensing
    Zhang, Xichen
    Lu, Rongxing
    Ray, Suprio
    Shao, Jun
    Ghorbani, Ali A.
    2021 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2021,
  • [48] An improved target tracking algorithm based on spatio-temporal context under occlusions
    Xin Yang
    Songyan Zhu
    Dake Zhou
    Yifan Zhang
    Multidimensional Systems and Signal Processing, 2020, 31 : 329 - 344
  • [49] Video-Based Multi-Camera Vehicle Tracking via Appearance-Parsing Spatio-Temporal Trajectory Matching Network
    Zhang, Xiaoqin
    Yu, Hongqi
    Qin, Yong
    Zhou, Xiaolong
    Chan, Sixian
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 10077 - 10091
  • [50] DSTVis: toward better interactive visual analysis of Drones' spatio-temporal data
    Chen, Fengxin
    Yu, Ye
    Ni, Liangliang
    Zhang, Zhenya
    Lu, Qiang
    JOURNAL OF VISUALIZATION, 2024, 27 (04) : 623 - 638