Towards Real-World Visual Tracking With Temporal Contexts

被引:24
作者
Cao, Ziang [1 ]
Huang, Ziyuan [2 ]
Pan, Liang [1 ]
Zhang, Shiwei [3 ]
Liu, Ziwei [1 ]
Fu, Changhong [4 ]
机构
[1] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore 639798, Singapore
[2] Natl Univ Singapore, Dept Mech Engn, Singapore 119077, Singapore
[3] DAMO Acad, Alibaba Grp, Hangzhou 310052, Zhejiang, Peoples R China
[4] Tongji Univ, Sch Mech Engn, Shanghai 201804, Peoples R China
基金
中国国家自然科学基金; 上海市自然科学基金;
关键词
Latency-aware evaluations; real-world tests; temporal contexts; two-level framework; visual tracking; PLUS PLUS; NETWORK;
D O I
10.1109/TPAMI.2023.3307174
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual tracking has made significant improvements in the past few decades. Most existing state-of-the-art trackers 1) merely aim for performance in ideal conditions while overlooking the real-world conditions; 2) adopt the tracking-by-detection paradigm, neglecting rich temporal contexts; 3) only integrate the temporal information into the template, where temporal contexts among consecutive frames are far from being fully utilized. To handle those problems, we propose a two-level framework (TCTrack) that can exploit temporal contexts efficiently. Based on it, we propose a stronger version for real-world visual tracking, i.e., TCTrack++. It boils down to two levels: features and similarity maps. Specifically, for feature extraction, we propose an attention-based temporally adaptive convolution to enhance the spatial features using temporal information, which is achieved by dynamically calibrating the convolution weights. For similarity map refinement, we introduce an adaptive temporal transformer to encode the temporal knowledge efficiently and decode it for the accurate refinement of the similarity map. To further improve the performance, we additionally introduce a curriculum learning strategy. Also, we adopt online evaluation to measure performance in real-world conditions. Exhaustive experiments on 8 well-known benchmarks demonstrate the superiority of TCTrack++. Real-world tests directly verify that TCTrack++ can be readily used in real-world applications.
引用
收藏
页码:15834 / 15849
页数:16
相关论文
共 92 条
[11]   Siamese Box Adaptive Network for Visual Tracking [J].
Chen, Zedu ;
Zhong, Bineng ;
Li, Guorong ;
Zhang, Shengping ;
Ji, Rongrong .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :6667-6676
[12]   Context-aware Deep Feature Compression for High-speed Visual Tracking [J].
Choi, Jongwon ;
Chang, Hyung Jin ;
Fischer, Tobias ;
Yun, Sangdoo ;
Lee, Kyuewang ;
Jeong, Jiyeoup ;
Demiris, Yiannis ;
Choi, Jin Young .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :479-488
[13]   MixFormer: End-to-End Tracking with Iterative Mixed Attention [J].
Cui, Yutao ;
Jiang, Cheng ;
Wang, Limin ;
Wu, Gangshan .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :13598-13608
[14]   Probabilistic Regression for Visual Tracking [J].
Danelljan, Martin ;
Van Gool, Luc ;
Timofte, Radu .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :7181-7190
[15]   ATOM: Accurate Tracking by Overlap Maximization [J].
Danelljan, Martin ;
Bhat, Goutam ;
Khan, Fahad Shahbaz ;
Felsberg, Michael .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4655-4664
[16]   ECO: Efficient Convolution Operators for Tracking [J].
Danelljan, Martin ;
Bhat, Goutam ;
Khan, Fahad Shahbaz ;
Felsberg, Michael .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6931-6939
[17]   Discriminative Scale Space Tracking [J].
Danelljan, Martin ;
Hager, Gustav ;
Khan, Fahad Shahbaz ;
Felsberg, Michael .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (08) :1561-1575
[18]   Learning Spatially Regularized Correlation Filters for Visual Tracking [J].
Danelljan, Martin ;
Hager, Gustav ;
Khan, Fahad Shahbaz ;
Felsberg, Michael .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4310-4318
[19]   Triplet Loss in Siamese Network for Object Tracking [J].
Dong, Xingping ;
Shen, Jianbing .
COMPUTER VISION - ECCV 2018, PT XIII, 2018, 11217 :472-488
[20]  
Dosovitskiy A., 2020, ICLR, V20, DOI 10.48550/arXiv.2010.11929