Unsupervised Deep Representation Learning for Real-Time Tracking

被引:89
作者
Wang, Ning [1 ]
Zhou, Wengang [1 ,2 ]
Song, Yibing [3 ]
Ma, Chao [4 ]
Liu, Wei [3 ]
Li, Houqiang [1 ,2 ]
机构
[1] Univ Sci & Technol China, CAS Key Lab GIPAS, Hefei, Peoples R China
[2] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei, Peoples R China
[3] Tencent AI Lab, Shenzhen, Peoples R China
[4] Shanghai Jiao Tong Univ, AI Inst, MOE Key Lab Artificial Intelligence, Shanghai, Peoples R China
关键词
Visual tracking; Unsupervised learning; Correlation filter; Siamese network; CORRELATION FILTERS; OBJECT TRACKING;
D O I
10.1007/s11263-020-01357-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The advancement of visual tracking has continuously been brought by deep learning models. Typically, supervised learning is employed to train these models with expensive labeled data. In order to reduce the workload of manual annotation and learn to track arbitrary objects, we propose an unsupervised learning method for visual tracking. The motivation of our unsupervised learning is that a robust tracker should be effective in bidirectional tracking. Specifically, the tracker is able to forward localize a target object in successive frames and backtrace to its initial position in the first frame. Based on such a motivation, in the training process, we measure the consistency between forward and backward trajectories to learn a robust tracker from scratch merely using unlabeled videos. We build our framework on a Siamese correlation filter network, and propose a multi-frame validation scheme and a cost-sensitive loss to facilitate unsupervised learning. Without bells and whistles, the proposed unsupervised tracker achieves the baseline accuracy of classic fully supervised trackers while achieving a real-time speed. Furthermore, our unsupervised framework exhibits a potential in leveraging more unlabeled or weakly labeled data to further improve the tracking accuracy.
引用
收藏
页码:400 / 418
页数:19
相关论文
共 78 条
  • [1] [Anonymous], 2011, ARXIV11126209
  • [2] [Anonymous], 2016, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2016.465
  • [3] Fully-Convolutional Siamese Networks for Object Tracking
    Bertinetto, Luca
    Valmadre, Jack
    Henriques, Joao F.
    Vedaldi, Andrea
    Torr, Philip H. S.
    [J]. COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 : 850 - 865
  • [4] Bolme DS, 2010, PROC CVPR IEEE, P2544, DOI 10.1109/CVPR.2010.5539960
  • [5] The devil is in the details: an evaluation of recent feature encoding methods
    Chatfield, Ken
    Lempitsky, Victor
    Vedaldi, Andrea
    Zisserman, Andrew
    [J]. PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2011, 2011,
  • [6] Chen B, 2018, 2018 EUROPEAN CONFERENCE ON OPTICAL COMMUNICATION (ECOC)
  • [7] Context-aware Deep Feature Compression for High-speed Visual Tracking
    Choi, Jongwon
    Chang, Hyung Jin
    Fischer, Tobias
    Yun, Sangdoo
    Lee, Kyuewang
    Jeong, Jiyeoup
    Demiris, Yiannis
    Choi, Jin Young
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 479 - 488
  • [8] Attentional Correlation Filter Network for Adaptive Visual Tracking
    Choi, Jongwon
    Chang, Hyung Jin
    Yun, Sangdoo
    Fischer, Tobias
    Demiris, Yiannis
    Choi, Jin Young
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4828 - 4837
  • [9] Visual Tracking Using Attention-Modulated Disintegration and Integration
    Choi, Jongwon
    Chang, Hyung Jin
    Jeong, Jiyeoup
    Demiris, Yiannis
    Choi, Jin Young
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 4321 - 4330
  • [10] Histograms of oriented gradients for human detection
    Dalal, N
    Triggs, B
    [J]. 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 886 - 893