Joint Spatio-Temporal Similarity and Discrimination Learning for Visual Tracking

被引：0

作者：

Liang, Yanjie ^{[1
]}

Chen, Haosheng ^{[2
]}

Wu, Qiangqiang ^{[3
]}

Xia, Changqun ^{[1
]}

Li, Jia ^{[4
]}

机构：

[1] Peng Cheng Lab, Shenzhen 518000, Peoples R China

[2] Chongqing Univ Posts & Telecommun, Coll Comp Sci & Technol, Chongqing Key Lab Image Cognit, Chongqing 400065, Peoples R China

[3] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China

[4] Beihang Univ, Sch Comp Sci & Engn, State Key Lab Virtual Real Technol & Syst, Beijing 100191, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 08期

基金：

中国国家自然科学基金; 中国博士后科学基金;

关键词：

Target tracking; Location awareness; Correlation; Visualization; Learning systems; Circuits and systems; Transformers; Video object tracking; joint learning; spatio-temporal similarity; spatio-temporal discrimination; adaptive response map fusion;

D O I：

10.1109/TCSVT.2024.3377379

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Visual tracking is a task of localizing a target unceasingly in a video with an initial target state at the first frame. The limited target information makes this problem an extremely challenging task. Existing tracking methods either perform matching based similarity learning or optimization based discrimination reasoning. However, these two types of tracking methods suffer from the problem of ineffectiveness for distinguishing target objects from background distractors and the problem of insufficiency in maintaining spatio-temporal consistency among successive frames, respectively. In this paper, we design a joint spatio-temporal similarity and discrimination learning (STSDL) framework for accurate and robust tracking. The designed framework is composed of two complementary branches: a similarity learning branch and a discrimination learning branch. The similarity learning branch uses an effective transformer encoder-decoder to gather rich spatio-temporal context information to generate a similarity map. In parallel, the discrimination learning branch exploits an efficient model predictor to train a target model to produce a discriminative map. Finally, the similarity map and the discriminative map are adaptively fused for accurate and robust target localization. Experimental results on six prevalent datasets demonstrate that the proposed STSDL can obtain satisfactory results, while it retains a real-time tracking speed of 50 FPS on a single GPU.

引用

页码：7284 / 7300

页数：17

共 50 条

[1] Adaptive spatio-temporal context learning for visual tracking
Zhang, Yaqin
Wang, Liejun
Qin, Jiwei
IMAGING SCIENCE JOURNAL, 2019, 67 (03) : 136 - 147
[2] Combining Spatio-Temporal Context and Kalman Filtering for Visual Tracking
Yang, Haoran
Wang, Juanjuan
Miao, Yi
Yang, Yulu
Zhao, Zengshun
Wang, Zhigang
Sun, Qian
Wu, Dapeng Oliver
MATHEMATICS, 2019, 7 (11)
[3] Hypothesis Testing Based Tracking With Spatio-Temporal Joint Interaction Modeling
Sheng, Hao
Zhang, Yang
Wu, Yubin
Wang, Shuai
Lyu, Weifeng
Ke, Wei
Xiong, Zhang
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (09) : 2971 - 2983
[4] Learning Sequence Descriptor Based on Spatio-Temporal Attention for Visual Place Recognition
Zhao, Junqiao
Zhang, Fenglin
Cai, Yingfeng
Tian, Gengxuan
Mu, Wenjie
Ye, Chen
Feng, Tiantian
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (03) : 2351 - 2358
[5] Video Text Tracking With a Spatio-Temporal Complementary Model
Gao, Yuzhe
Li, Xing
Zhang, Jiajian
Zhou, Yu
Jin, Dian
Wang, Jing
Zhu, Shenggao
Bai, Xiang
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 9321 - 9331
[6] Spatio-Temporal Contextual Learning for Single Object Tracking on Point Clouds
Gao, Jiantao
Yan, Xu
Zhao, Weibing
Lyu, Zhen
Liao, Yinghong
Zheng, Chaoda
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (07) : 9470 - 9482
[7] Spatio-temporal wavelets and tracking in noisy environments
Mujica, F
Murenzi, R
Smith, MJT
WAVELET APPLICATIONS V, 1998, 3391 : 560 - 568
[8] Spatio-Temporal Similarity Search Method for Disaster Estimation
Hayashi, Hideki
Asahara, Akinori
Sugaya, Natsuko
Ogawa, Yuichi
Tomita, Hitoshi
PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 2462 - 2469
[9] Spatio-Temporal Data Augmentation for Visual Surveillance
Kim, Jae-Yeul
Ha, Jong-Eun
IEEE ACCESS, 2021, 9 : 165014 - 165033
[10] Learning All Dynamics: Traffic Forecasting via Locality-Aware Spatio-Temporal Joint Transformer
Fang, Yuchen
Zhao, Fang
Qin, Yanjun
Luo, Haiyong
Wang, Chenxing
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (12) : 23433 - 23446

← 1 2 3 4 5 →