Joint Spatio-Temporal Similarity and Discrimination Learning for Visual Tracking

被引:0
作者
Liang, Yanjie [1 ]
Chen, Haosheng [2 ]
Wu, Qiangqiang [3 ]
Xia, Changqun [1 ]
Li, Jia [4 ]
机构
[1] Peng Cheng Lab, Shenzhen 518000, Peoples R China
[2] Chongqing Univ Posts & Telecommun, Coll Comp Sci & Technol, Chongqing Key Lab Image Cognit, Chongqing 400065, Peoples R China
[3] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
[4] Beihang Univ, Sch Comp Sci & Engn, State Key Lab Virtual Real Technol & Syst, Beijing 100191, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Target tracking; Location awareness; Correlation; Visualization; Learning systems; Circuits and systems; Transformers; Video object tracking; joint learning; spatio-temporal similarity; spatio-temporal discrimination; adaptive response map fusion;
D O I
10.1109/TCSVT.2024.3377379
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Visual tracking is a task of localizing a target unceasingly in a video with an initial target state at the first frame. The limited target information makes this problem an extremely challenging task. Existing tracking methods either perform matching based similarity learning or optimization based discrimination reasoning. However, these two types of tracking methods suffer from the problem of ineffectiveness for distinguishing target objects from background distractors and the problem of insufficiency in maintaining spatio-temporal consistency among successive frames, respectively. In this paper, we design a joint spatio-temporal similarity and discrimination learning (STSDL) framework for accurate and robust tracking. The designed framework is composed of two complementary branches: a similarity learning branch and a discrimination learning branch. The similarity learning branch uses an effective transformer encoder-decoder to gather rich spatio-temporal context information to generate a similarity map. In parallel, the discrimination learning branch exploits an efficient model predictor to train a target model to produce a discriminative map. Finally, the similarity map and the discriminative map are adaptively fused for accurate and robust target localization. Experimental results on six prevalent datasets demonstrate that the proposed STSDL can obtain satisfactory results, while it retains a real-time tracking speed of 50 FPS on a single GPU.
引用
收藏
页码:7284 / 7300
页数:17
相关论文
共 50 条
  • [31] Visual analytics of economic features for multivariate spatio-temporal GDP data
    Zhou, Zhiguang
    Li, Huihui
    Liu, Fang
    Liu, Yanan
    Huang, Chaogeng
    Tao, Yubo
    Lin, Hai
    Su, Weihua
    JOURNAL OF VISUALIZATION, 2018, 21 (02) : 337 - 350
  • [32] AirPollutionViz: visual analytics for understanding the spatio-temporal evolution of air pollution
    Yue, Xiaoqi
    Feng, Dan
    Sun, Desheng
    Liu, Chao
    Qin, Hongxing
    Hu, Haibo
    JOURNAL OF VISUALIZATION, 2024, 27 (02) : 215 - 233
  • [33] Visual Analysis of Spatio-temporal Phenomena with 1D Projections
    Franke, M.
    Martin, H.
    Koch, S.
    Kurzhals, K.
    COMPUTER GRAPHICS FORUM, 2021, 40 (03) : 335 - 347
  • [34] SocialWave: Visual Analysis of Spatio-temporal Diffusion of Information on Social Media
    Sun, Guodao
    Tang, Tan
    Peng, Tai-Quan
    Liang, Ronghua
    Wu, Yingcai
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2018, 9 (02)
  • [35] MotionGlyphs: Visual Abstraction of Spatio-Temporal Networks in Collective Animal Behavior
    Cakmak, E.
    Schaefer, H.
    Buchmueller, J.
    Fuchs, J.
    Schreck, T.
    Jordan, A.
    Keim, D.
    COMPUTER GRAPHICS FORUM, 2020, 39 (03) : 63 - 75
  • [36] Deep Learning for Spatio-Temporal Modeling of Dynamic Spontaneous Emotions
    Al Chanti, Dawood
    Caplier, Alice
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2021, 12 (02) : 363 - 376
  • [37] PhoenixMap: Spatio-Temporal Distribution Analysis with Deep Learning Classifications
    Zhao, Junhan
    Liu, Xiang
    Guan, Ryan
    Zhang, Josephine
    Yang, Baijian
    Qian, Zhenyu
    Chen, Yingjie
    2018 IEEE CONFERENCE ON VISUAL ANALYTICS SCIENCE AND TECHNOLOGY (VAST), 2018, : 114 - 115
  • [38] Two-Stage Spatio-Temporal Feature Correlation Network for Infrared Ground Target Tracking
    Li, Shaoyi
    Fu, Guodong
    Yang, Xi
    Cao, Xiqing
    Niu, Saisai
    Meng, Zhongjie
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 14
  • [39] Learning Feature Semantic Matching for Spatio-Temporal Video Grounding
    Zhang, Tong
    Fang, Hao
    Zhang, Hao
    Gao, Jialin
    Lu, Xiankai
    Nie, Xiushan
    Yin, Yilong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9268 - 9279
  • [40] Learning Temporal-Correlated and Channel- Decorrelated Siamese Networks for Visual Tracking
    Xi, Mao
    Zhou, Wengang
    Wang, Ning
    Li, Houqiang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 2791 - 2803