Switch and Refine: A Long-Term Tracking and Segmentation Framework

被引:20
作者
Xu, Xiang [1 ]
Zhao, Jian [2 ]
Wu, Jianmin [3 ]
Shen, Furao [1 ]
机构
[1] Nanjing Univ, Sch Artificial Intelligence, State Key Lab Novel Software Technol, Nanjing, Peoples R China
[2] Nanjing Univ, Sch Elect Sci & Engn, Nanjing 210023, Jiangsu, Peoples R China
[3] State Grid Shanghai Maintenance Co, Shanghai 200063, Peoples R China
关键词
Target tracking; Task analysis; Switches; Object tracking; Learning systems; Estimation; Object segmentation; Visual object tracking; long-term tracking; visual object segmentation; OBJECT TRACKING; NETWORKS;
D O I
10.1109/TCSVT.2022.3210245
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In long-term video object tracking (VOT) tasks, most long-term trackers are modified from short-term trackers, which contain more and more machine learning modules to improve their performance. However, we empirically find that more modules do not necessarily lead to better results. In this paper, we make the long-term tracking framework simple by carefully selecting the cutting-edge trackers. Specifically, we propose a new long-term VOT framework that combines the benefits of two mainstream short-term tracking pipelines, i.e., the discriminative online tracker and the one-shot Siamese tracker, with a global re-detector awakened when the target is lost. Such a framework fully exploits existing advanced works from three complementary perspectives. Experimental results show that by exploiting the capabilities of existing methods instead of designing new neural networks, we can still achieve remarkable results on seven long-term VOT datasets. By introducing a continuous adjustable speed control parameter, our tracker reaches 20+FPS with only a small performance loss. The refine module not only improves the bounding box estimations but also outputs segmentation masks, so that our framework can handle the video object segmentation (VOS) tasks by using only VOT trackers. We obtain a trade-off between time and accuracy on two representative VOS datasets by only using bounding boxes as the initial input.
引用
收藏
页码:1291 / 1304
页数:14
相关论文
共 77 条
[1]   Learning What to Learn for Video Object Segmentation [J].
Bhat, Goutam ;
Lawin, Felix Jaremo ;
Danelljan, Martin ;
Robinson, Andreas ;
Felsberg, Michael ;
Van Gool, Luc ;
Timofte, Radu .
COMPUTER VISION - ECCV 2020, PT II, 2020, 12347 :777-794
[2]   Know Your Surroundings: Exploiting Scene Information for Object Tracking [J].
Bhat, Goutam ;
Danelljan, Martin ;
Van Gool, Luc ;
Timofte, Radu .
COMPUTER VISION - ECCV 2020, PT XXIII, 2020, 12368 :205-221
[3]   Learning Discriminative Model Prediction for Tracking [J].
Bhat, Goutam ;
Danelljan, Martin ;
Van Gool, Luc ;
Timofte, Radu .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6181-6190
[4]   Cascade R-CNN: Delving into High Quality Object Detection [J].
Cai, Zhaowei ;
Vasconcelos, Nuno .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6154-6162
[5]  
Chen H., 2022, IEEE T CIRC SYST VID, DOI [10.1109/TCSVT.2022.3185252.[65]J., DOI 10.1109/TCSVT.2022.3185252.[65]J]
[6]   State-Aware Tracker for Real-Time Video Object Segmentation [J].
Chen, Xi ;
Li, Zuoxin ;
Yuan, Ye ;
Yu, Gang ;
Shen, Jianxin ;
Qi, Donglian .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :9381-9390
[7]   Transformer Tracking [J].
Chen, Xin ;
Yan, Bin ;
Zhu, Jiawen ;
Wang, Dong ;
Yang, Xiaoyun ;
Lu, Huchuan .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :8122-8131
[8]   Video Object Segmentation and Tracking Framework With Improved Threshold Decision and Diffusion Distance [J].
Chien, Shao-Yi ;
Chan, Wei-Kai ;
Tseng, Yu-Hsiang ;
Chen, Hong-Yuh .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2013, 23 (06) :921-934
[9]   Robust Long-Term Object Tracking via Improved Discriminative Model Prediction [J].
Choi, Seokeon ;
Lee, Junhyun ;
Lee, Yunsung ;
Hauptmann, Alexander .
COMPUTER VISION - ECCV 2020 WORKSHOPS, PT V, 2020, 12539 :602-617
[10]  
Cui C., 2020, ARXIV