Joint Spatio-Temporal Similarity and Discrimination Learning for Visual Tracking

被引:0
|
作者
Liang, Yanjie [1 ]
Chen, Haosheng [2 ]
Wu, Qiangqiang [3 ]
Xia, Changqun [1 ]
Li, Jia [4 ]
机构
[1] Peng Cheng Lab, Shenzhen 518000, Peoples R China
[2] Chongqing Univ Posts & Telecommun, Coll Comp Sci & Technol, Chongqing Key Lab Image Cognit, Chongqing 400065, Peoples R China
[3] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
[4] Beihang Univ, Sch Comp Sci & Engn, State Key Lab Virtual Real Technol & Syst, Beijing 100191, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Target tracking; Location awareness; Correlation; Visualization; Learning systems; Circuits and systems; Transformers; Video object tracking; joint learning; spatio-temporal similarity; spatio-temporal discrimination; adaptive response map fusion;
D O I
10.1109/TCSVT.2024.3377379
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Visual tracking is a task of localizing a target unceasingly in a video with an initial target state at the first frame. The limited target information makes this problem an extremely challenging task. Existing tracking methods either perform matching based similarity learning or optimization based discrimination reasoning. However, these two types of tracking methods suffer from the problem of ineffectiveness for distinguishing target objects from background distractors and the problem of insufficiency in maintaining spatio-temporal consistency among successive frames, respectively. In this paper, we design a joint spatio-temporal similarity and discrimination learning (STSDL) framework for accurate and robust tracking. The designed framework is composed of two complementary branches: a similarity learning branch and a discrimination learning branch. The similarity learning branch uses an effective transformer encoder-decoder to gather rich spatio-temporal context information to generate a similarity map. In parallel, the discrimination learning branch exploits an efficient model predictor to train a target model to produce a discriminative map. Finally, the similarity map and the discriminative map are adaptively fused for accurate and robust target localization. Experimental results on six prevalent datasets demonstrate that the proposed STSDL can obtain satisfactory results, while it retains a real-time tracking speed of 50 FPS on a single GPU.
引用
收藏
页码:7284 / 7300
页数:17
相关论文
共 50 条
  • [1] Joint spatio-temporal modeling for visual tracking
    Sun, Yumei
    Tang, Chuanming
    Luo, Hui
    Li, Qingqing
    Peng, Xiaoming
    Zhang, Jianlin
    Li, Meihui
    Wei, Yuxing
    KNOWLEDGE-BASED SYSTEMS, 2024, 283
  • [2] Spatio-temporal Active Learning for Visual Tracking
    Liu, Chenfeng
    Zhu, Pengfei
    Hu, Qinghua
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [3] Learning Spatio-Temporal Transformer for Visual Tracking
    Yan, Bin
    Peng, Houwen
    Fu, Jianlong
    Wang, Dong
    Lu, Huchuan
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 10428 - 10437
  • [4] Learning spatio-temporal correlation filter for visual tracking
    Yan, Youmin
    Guo, Xixian
    Tang, Jin
    Li, Chenglong
    Wang, Xin
    NEUROCOMPUTING, 2021, 436 : 273 - 282
  • [5] Deep learning of spatio-temporal information for visual tracking
    Gwangmin Choe
    Ilmyong Son
    Chunhwa Choe
    Hyoson So
    Hyokchol Kim
    Gyongnam Choe
    Multimedia Tools and Applications, 2022, 81 : 17283 - 17302
  • [6] Deep learning of spatio-temporal information for visual tracking
    Choe, Gwangmin
    Son, Ilmyong
    Choe, Chunhwa
    So, Hyoson
    Kim, Hyokchol
    Choe, Gyongnam
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (12) : 17283 - 17302
  • [7] Adaptive spatio-temporal context learning for visual tracking
    Zhang, Yaqin
    Wang, Liejun
    Qin, Jiwei
    IMAGING SCIENCE JOURNAL, 2019, 67 (03): : 136 - 147
  • [8] Online Spatio-temporal Structural Context Learning for Visual Tracking
    Wen, Longyin
    Cai, Zhaowei
    Lei, Zhen
    Yi, Dong
    Li, Stan Z.
    COMPUTER VISION - ECCV 2012, PT IV, 2012, 7575 : 716 - 729
  • [9] Adaptive Spatio-Temporal Context Learning for Visual Target Tracking
    Marvasti-Zadeh, Seyed Mojtaba
    Ghanei-Yakhdan, Hossein
    Kasaei, Shohreh
    2017 10TH IRANIAN CONFERENCE ON MACHINE VISION AND IMAGE PROCESSING (MVIP), 2017, : 10 - 14
  • [10] Spatio-temporal joint aberrance suppressed correlation filter for visual tracking
    Libin Xu
    Pyoungwon Kim
    Mengjie Wang
    Jinfeng Pan
    Xiaomin Yang
    Mingliang Gao
    Complex & Intelligent Systems, 2022, 8 : 3765 - 3777