Joint Spatio-Temporal Similarity and Discrimination Learning for Visual Tracking

被引:0
|
作者
Liang, Yanjie [1 ]
Chen, Haosheng [2 ]
Wu, Qiangqiang [3 ]
Xia, Changqun [1 ]
Li, Jia [4 ]
机构
[1] Peng Cheng Lab, Shenzhen 518000, Peoples R China
[2] Chongqing Univ Posts & Telecommun, Coll Comp Sci & Technol, Chongqing Key Lab Image Cognit, Chongqing 400065, Peoples R China
[3] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
[4] Beihang Univ, Sch Comp Sci & Engn, State Key Lab Virtual Real Technol & Syst, Beijing 100191, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Target tracking; Location awareness; Correlation; Visualization; Learning systems; Circuits and systems; Transformers; Video object tracking; joint learning; spatio-temporal similarity; spatio-temporal discrimination; adaptive response map fusion;
D O I
10.1109/TCSVT.2024.3377379
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Visual tracking is a task of localizing a target unceasingly in a video with an initial target state at the first frame. The limited target information makes this problem an extremely challenging task. Existing tracking methods either perform matching based similarity learning or optimization based discrimination reasoning. However, these two types of tracking methods suffer from the problem of ineffectiveness for distinguishing target objects from background distractors and the problem of insufficiency in maintaining spatio-temporal consistency among successive frames, respectively. In this paper, we design a joint spatio-temporal similarity and discrimination learning (STSDL) framework for accurate and robust tracking. The designed framework is composed of two complementary branches: a similarity learning branch and a discrimination learning branch. The similarity learning branch uses an effective transformer encoder-decoder to gather rich spatio-temporal context information to generate a similarity map. In parallel, the discrimination learning branch exploits an efficient model predictor to train a target model to produce a discriminative map. Finally, the similarity map and the discriminative map are adaptively fused for accurate and robust target localization. Experimental results on six prevalent datasets demonstrate that the proposed STSDL can obtain satisfactory results, while it retains a real-time tracking speed of 50 FPS on a single GPU.
引用
收藏
页码:7284 / 7300
页数:17
相关论文
共 50 条
  • [31] Visual Tracking via Spatio-temporal Context Learning using Multi-Templates
    Zhu, Zhengyu
    Zhu, Wei
    Li, Shuai
    2017 IEEE 3RD INFORMATION TECHNOLOGY AND MECHATRONICS ENGINEERING CONFERENCE (ITOEC), 2017, : 708 - 712
  • [32] Robust Visual Tracking via Multi-Scale Spatio-Temporal Context Learning
    Xue, Wanli
    Xu, Chao
    Feng, Zhiyong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2018, 28 (10) : 2849 - 2860
  • [33] SPATIO-TEMPORAL CORRELATION LEARNING FOR MULTIPLE OBJECT TRACKING
    Jian, Yajun
    Zhuang, Chihui
    He, Wenyan
    Du, Kaiwen
    Lu, Yang
    Wang, Hanzi
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 6170 - 6174
  • [34] Improved Target Tracking Based on Spatio-Temporal Learning
    Jia, Songmin
    Zeng, Dishi
    Xu, Tao
    Zhang, Hui
    Li, Xiuzhi
    2016 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION (ICIA), 2016, : 1840 - 1845
  • [35] UAV Visual Object Tracking Based on Spatio-Temporal Context
    He, Yongxiang
    Chao, Chuang
    Zhang, Zhao
    Guo, Hongwu
    Ma, Jianjun
    DRONES, 2024, 8 (12)
  • [36] Unified spatio-temporal attention mixformer for visual object tracking
    Park, Minho
    Yoon, Gang-Joon
    Song, Jinjoo
    Yoon, Sang Min
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 134
  • [37] Robust Visual Tracking via Spatio-Temporal Cue Integration
    He, Yang
    Pei, Mingtao
    Yang, Min
    Wu, Yuwei
    Liang, Wei
    FIFTH INTERNATIONAL CONFERENCE ON GRAPHIC AND IMAGE PROCESSING (ICGIP 2013), 2014, 9069
  • [38] Combining Spatio-Temporal Context and Kalman Filtering for Visual Tracking
    Yang, Haoran
    Wang, Juanjuan
    Miao, Yi
    Yang, Yulu
    Zhao, Zengshun
    Wang, Zhigang
    Sun, Qian
    Wu, Dapeng Oliver
    MATHEMATICS, 2019, 7 (11)
  • [39] Spatio-temporal mix deformable feature extractor in visual tracking
    Huang, Yucheng
    Xiao, Ziwang
    Firkat, Eksan
    Zhang, Jinlai
    Wu, Danfeng
    Hamdulla, Askar
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 237
  • [40] Memory Prompt for Spatio-Temporal Transformer Visual Object Tracking
    Xu T.
    Wu X.
    Zhu X.
    Kittler J.
    IEEE Transactions on Artificial Intelligence, 2024, 5 (08): : 1 - 6