Unsupervised Deep Representation Learning for Real-Time Tracking

被引:96
作者
Wang, Ning [1 ]
Zhou, Wengang [1 ,2 ]
Song, Yibing [3 ]
Ma, Chao [4 ]
Liu, Wei [3 ]
Li, Houqiang [1 ,2 ]
机构
[1] Univ Sci & Technol China, CAS Key Lab GIPAS, Hefei, Peoples R China
[2] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei, Peoples R China
[3] Tencent AI Lab, Shenzhen, Peoples R China
[4] Shanghai Jiao Tong Univ, AI Inst, MOE Key Lab Artificial Intelligence, Shanghai, Peoples R China
关键词
Visual tracking; Unsupervised learning; Correlation filter; Siamese network; CORRELATION FILTERS; OBJECT TRACKING;
D O I
10.1007/s11263-020-01357-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The advancement of visual tracking has continuously been brought by deep learning models. Typically, supervised learning is employed to train these models with expensive labeled data. In order to reduce the workload of manual annotation and learn to track arbitrary objects, we propose an unsupervised learning method for visual tracking. The motivation of our unsupervised learning is that a robust tracker should be effective in bidirectional tracking. Specifically, the tracker is able to forward localize a target object in successive frames and backtrace to its initial position in the first frame. Based on such a motivation, in the training process, we measure the consistency between forward and backward trajectories to learn a robust tracker from scratch merely using unlabeled videos. We build our framework on a Siamese correlation filter network, and propose a multi-frame validation scheme and a cost-sensitive loss to facilitate unsupervised learning. Without bells and whistles, the proposed unsupervised tracker achieves the baseline accuracy of classic fully supervised trackers while achieving a real-time speed. Furthermore, our unsupervised framework exhibits a potential in leveraging more unlabeled or weakly labeled data to further improve the tracking accuracy.
引用
收藏
页码:400 / 418
页数:19
相关论文
共 78 条
[1]   Fully-Convolutional Siamese Networks for Object Tracking [J].
Bertinetto, Luca ;
Valmadre, Jack ;
Henriques, Joao F. ;
Vedaldi, Andrea ;
Torr, Philip H. S. .
COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 :850-865
[2]  
Bolme DS, 2010, PROC CVPR IEEE, P2544, DOI 10.1109/CVPR.2010.5539960
[3]   The devil is in the details: an evaluation of recent feature encoding methods [J].
Chatfield, Ken ;
Lempitsky, Victor ;
Vedaldi, Andrea ;
Zisserman, Andrew .
PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2011, 2011,
[4]   Real-Time 'Actor-Critic' Tracking [J].
Chen, Boyu ;
Wang, Dong ;
Li, Peixia ;
Wang, Shuang ;
Lu, Huchuan .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :328-345
[5]   Context-aware Deep Feature Compression for High-speed Visual Tracking [J].
Choi, Jongwon ;
Chang, Hyung Jin ;
Fischer, Tobias ;
Yun, Sangdoo ;
Lee, Kyuewang ;
Jeong, Jiyeoup ;
Demiris, Yiannis ;
Choi, Jin Young .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :479-488
[6]   Attentional Correlation Filter Network for Adaptive Visual Tracking [J].
Choi, Jongwon ;
Chang, Hyung Jin ;
Yun, Sangdoo ;
Fischer, Tobias ;
Demiris, Yiannis ;
Choi, Jin Young .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4828-4837
[7]   Visual Tracking Using Attention-Modulated Disintegration and Integration [J].
Choi, Jongwon ;
Chang, Hyung Jin ;
Jeong, Jiyeoup ;
Demiris, Yiannis ;
Choi, Jin Young .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :4321-4330
[8]   Histograms of oriented gradients for human detection [J].
Dalal, N ;
Triggs, B .
2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893
[9]  
Danelljan M, 2014, P BRIT MACH VIS C
[10]   ECO: Efficient Convolution Operators for Tracking [J].
Danelljan, Martin ;
Bhat, Goutam ;
Khan, Fahad Shahbaz ;
Felsberg, Michael .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6931-6939