Towards Real-World Visual Tracking With Temporal Contexts

被引：44

作者：

Cao, Ziang ^{[1
]}

Huang, Ziyuan ^{[2
]}

Pan, Liang ^{[1
]}

Zhang, Shiwei ^{[3
]}

Liu, Ziwei ^{[1
]}

Fu, Changhong ^{[4
]}

机构：

[1] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore 639798, Singapore

[2] Natl Univ Singapore, Dept Mech Engn, Singapore 119077, Singapore

[3] DAMO Acad, Alibaba Grp, Hangzhou 310052, Zhejiang, Peoples R China

[4] Tongji Univ, Sch Mech Engn, Shanghai 201804, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2023年 / 45卷 / 12期

基金：

中国国家自然科学基金; 上海市自然科学基金;

关键词：

Latency-aware evaluations; real-world tests; temporal contexts; two-level framework; visual tracking; PLUS PLUS; NETWORK;

D O I：

10.1109/TPAMI.2023.3307174

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Visual tracking has made significant improvements in the past few decades. Most existing state-of-the-art trackers 1) merely aim for performance in ideal conditions while overlooking the real-world conditions; 2) adopt the tracking-by-detection paradigm, neglecting rich temporal contexts; 3) only integrate the temporal information into the template, where temporal contexts among consecutive frames are far from being fully utilized. To handle those problems, we propose a two-level framework (TCTrack) that can exploit temporal contexts efficiently. Based on it, we propose a stronger version for real-world visual tracking, i.e., TCTrack++. It boils down to two levels: features and similarity maps. Specifically, for feature extraction, we propose an attention-based temporally adaptive convolution to enhance the spatial features using temporal information, which is achieved by dynamically calibrating the convolution weights. For similarity map refinement, we introduce an adaptive temporal transformer to encode the temporal knowledge efficiently and decode it for the accurate refinement of the similarity map. To further improve the performance, we additionally introduce a curriculum learning strategy. Also, we adopt online evaluation to measure performance in real-world conditions. Exhaustive experiments on 8 well-known benchmarks demonstrate the superiority of TCTrack++. Real-world tests directly verify that TCTrack++ can be readily used in real-world applications.

引用

页码：15834 / 15849

页数：16

共 92 条

[1] Staple: Complementary Learners for Real-Time Tracking [J].

Bertinetto, Luca ;

Valmadre, Jack ;

Golodetz, Stuart ;

Miksik, Ondrej ;

Torr, Philip H. S. .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1401-1409

[2] Fully-Convolutional Siamese Networks for Object Tracking [J].

Bertinetto, Luca ;

Valmadre, Jack ;

Henriques, Joao F. ;

Vedaldi, Andrea ;

Torr, Philip H. S. .

COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 :850-865

[3] Know Your Surroundings: Exploiting Scene Information for Object Tracking [J].

Bhat, Goutam ;

Danelljan, Martin ;

Van Gool, Luc ;

Timofte, Radu .

COMPUTER VISION - ECCV 2020, PT XXIII, 2020, 12368 :205-221

[4] Learning Discriminative Model Prediction for Tracking [J].

Bhat, Goutam ;

Danelljan, Martin ;

Van Gool, Luc ;

Timofte, Radu .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6181-6190

[5]

Bolme DS, 2010, PROC CVPR IEEE, P2544, DOI 10.1109/CVPR.2010.5539960

[6] TCTrack: Temporal Contexts for Aerial Tracking [J].

Cao, Ziang ;

Huang, Ziyuan ;

Pan, Liang ;

Zhang, Shiwei ;

Liu, Ziwei ;

Fu, Changhong .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :14778-14788

[7] SiamAPN plus plus : Siamese Attentional Aggregation Network for Real-Time UAV Tracking [J].

Cao, Ziang ;

Fu, Changhong ;

Ye, Junjie ;

Li, Bowen ;

Li, Yiming .

2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, :3086-3092

[8] HiFT: Hierarchical Feature Transformer for Aerial Tracking [J].

Cao, Ziang ;

Fu, Changhong ;

Ye, Junjie ;

Li, Bowen ;

Li, Yiming .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :15437-15446

[9]

Chen T, 2020, PR MACH LEARN RES, V119

[10] Transformer Tracking [J].

Chen, Xin ;

Yan, Bin ;

Zhu, Jiawen ;

Wang, Dong ;

Yang, Xiaoyun ;

Lu, Huchuan .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :8122-8131

← 1 2 3 4 5 6 7 8 9 10 →