Joint Spatio-Temporal Similarity and Discrimination Learning for Visual Tracking

被引:0
作者
Liang, Yanjie [1 ]
Chen, Haosheng [2 ]
Wu, Qiangqiang [3 ]
Xia, Changqun [1 ]
Li, Jia [4 ]
机构
[1] Peng Cheng Lab, Shenzhen 518000, Peoples R China
[2] Chongqing Univ Posts & Telecommun, Coll Comp Sci & Technol, Chongqing Key Lab Image Cognit, Chongqing 400065, Peoples R China
[3] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
[4] Beihang Univ, Sch Comp Sci & Engn, State Key Lab Virtual Real Technol & Syst, Beijing 100191, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Target tracking; Location awareness; Correlation; Visualization; Learning systems; Circuits and systems; Transformers; Video object tracking; joint learning; spatio-temporal similarity; spatio-temporal discrimination; adaptive response map fusion;
D O I
10.1109/TCSVT.2024.3377379
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Visual tracking is a task of localizing a target unceasingly in a video with an initial target state at the first frame. The limited target information makes this problem an extremely challenging task. Existing tracking methods either perform matching based similarity learning or optimization based discrimination reasoning. However, these two types of tracking methods suffer from the problem of ineffectiveness for distinguishing target objects from background distractors and the problem of insufficiency in maintaining spatio-temporal consistency among successive frames, respectively. In this paper, we design a joint spatio-temporal similarity and discrimination learning (STSDL) framework for accurate and robust tracking. The designed framework is composed of two complementary branches: a similarity learning branch and a discrimination learning branch. The similarity learning branch uses an effective transformer encoder-decoder to gather rich spatio-temporal context information to generate a similarity map. In parallel, the discrimination learning branch exploits an efficient model predictor to train a target model to produce a discriminative map. Finally, the similarity map and the discriminative map are adaptively fused for accurate and robust target localization. Experimental results on six prevalent datasets demonstrate that the proposed STSDL can obtain satisfactory results, while it retains a real-time tracking speed of 50 FPS on a single GPU.
引用
收藏
页码:7284 / 7300
页数:17
相关论文
共 50 条
  • [1] Adaptive spatio-temporal context learning for visual tracking
    Zhang, Yaqin
    Wang, Liejun
    Qin, Jiwei
    IMAGING SCIENCE JOURNAL, 2019, 67 (03) : 136 - 147
  • [2] Combining Spatio-Temporal Context and Kalman Filtering for Visual Tracking
    Yang, Haoran
    Wang, Juanjuan
    Miao, Yi
    Yang, Yulu
    Zhao, Zengshun
    Wang, Zhigang
    Sun, Qian
    Wu, Dapeng Oliver
    MATHEMATICS, 2019, 7 (11)
  • [3] Hypothesis Testing Based Tracking With Spatio-Temporal Joint Interaction Modeling
    Sheng, Hao
    Zhang, Yang
    Wu, Yubin
    Wang, Shuai
    Lyu, Weifeng
    Ke, Wei
    Xiong, Zhang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (09) : 2971 - 2983
  • [4] Learning Sequence Descriptor Based on Spatio-Temporal Attention for Visual Place Recognition
    Zhao, Junqiao
    Zhang, Fenglin
    Cai, Yingfeng
    Tian, Gengxuan
    Mu, Wenjie
    Ye, Chen
    Feng, Tiantian
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (03) : 2351 - 2358
  • [5] Video Text Tracking With a Spatio-Temporal Complementary Model
    Gao, Yuzhe
    Li, Xing
    Zhang, Jiajian
    Zhou, Yu
    Jin, Dian
    Wang, Jing
    Zhu, Shenggao
    Bai, Xiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 9321 - 9331
  • [6] Spatio-Temporal Contextual Learning for Single Object Tracking on Point Clouds
    Gao, Jiantao
    Yan, Xu
    Zhao, Weibing
    Lyu, Zhen
    Liao, Yinghong
    Zheng, Chaoda
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (07) : 9470 - 9482
  • [7] Spatio-temporal wavelets and tracking in noisy environments
    Mujica, F
    Murenzi, R
    Smith, MJT
    WAVELET APPLICATIONS V, 1998, 3391 : 560 - 568
  • [8] Spatio-Temporal Similarity Search Method for Disaster Estimation
    Hayashi, Hideki
    Asahara, Akinori
    Sugaya, Natsuko
    Ogawa, Yuichi
    Tomita, Hitoshi
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 2462 - 2469
  • [9] Spatio-Temporal Data Augmentation for Visual Surveillance
    Kim, Jae-Yeul
    Ha, Jong-Eun
    IEEE ACCESS, 2021, 9 : 165014 - 165033
  • [10] Learning All Dynamics: Traffic Forecasting via Locality-Aware Spatio-Temporal Joint Transformer
    Fang, Yuchen
    Zhao, Fang
    Qin, Yanjun
    Luo, Haiyong
    Wang, Chenxing
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (12) : 23433 - 23446