A deep features based generative model for visual tracking

被引:10
作者
Feng, Ping [2 ]
Xu, Chunyan [3 ]
Zhao, Zhiqiang [2 ]
Liu, Fang [2 ]
Guo, Jingjuan [2 ]
Yuan, Caihong [2 ]
Wang, Tianjiang [2 ]
Duan, Kui [1 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Hosp, Wuhan 430074, Hubei, Peoples R China
[2] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan 430074, Hubei, Peoples R China
[3] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Jiangsu, Peoples R China
关键词
Visual tracking; Deep features; Generative model; OBJECT TRACKING;
D O I
10.1016/j.neucom.2018.05.007
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we propose a novel visual tracking algorithm based on a framework of generative model. In order to make the algorithm robust to various challenging appearance changes, we adopt the powerful deep features in the description of tracking object appearance. The features are extracted from a Convolutional Neural Network (CNN), which is a modified one based on the VGG-M nets but constructed with fewer convolution layers and sequences exclusively full connection layers. In the pretraining process, we add a special convolution layer called coefficients layer before the full connection layers. In the tracking process after the network being pretrained, we remove the coefficients layer and just update the full connection layers conditionally. To decide the new target's positions, we compute the compositive similarity scores containing three kinds of similarities with different weights. The first kind is similarities between candidates and the target in the first frame, and the second kind is between candidates and tracking results in the last frame. The third kind is related to the important object appearance variations in the tracking process. We design a simple mechanism to produce a collection to record those historical templates when the object appearance changed largely. With similarities between candidates and the historical templates, the drift problem can be alleviated to some extent, because similar historical appearances sometimes appear repeatedly and the recorded historical templates can provide important information. We use the outputs of the convolution part before the full connection layers as features and weight them with the coefficients layer's filter weights to compute all similarities. Finally, candidates with the highest scores will be regarded as new targets in the current frame. The evaluated results on CVPR2013 Online Object Tracking Benchmark show that our algorithm can achieve outstanding performance compared with state-of-the-art trackers. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:245 / 254
页数:10
相关论文
共 54 条
  • [1] Adam A., 2006, IEEE C COMPUTER VISI, V1, P798, DOI [DOI 10.1109/CVPR.2006.256, 10.1109/CVPR.2006.256]
  • [2] [Anonymous], 2010, P EUR C COMP VIS BER
  • [3] [Anonymous], 2015, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2015.7298610
  • [4] [Anonymous], 2016, CVPR CVPR
  • [5] Robust visual tracking via augmented kernel SVM
    Bai, Yancheng
    Tang, Ming
    [J]. IMAGE AND VISION COMPUTING, 2014, 32 (08) : 465 - 475
  • [6] Fully-Convolutional Siamese Networks for Object Tracking
    Bertinetto, Luca
    Valmadre, Jack
    Henriques, Joao F.
    Vedaldi, Andrea
    Torr, Philip H. S.
    [J]. COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 : 850 - 865
  • [7] Cehovin L, 2011, IEEE I CONF COMP VIS, P1363, DOI 10.1109/ICCV.2011.6126390
  • [8] The devil is in the details: an evaluation of recent feature encoding methods
    Chatfield, Ken
    Lempitsky, Victor
    Vedaldi, Andrea
    Zisserman, Andrew
    [J]. PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2011, 2011,
  • [9] Danelljan M., 2014, P BRIT MACH VIS C SE
  • [10] Danelljan M., 2016, P EUR C COMP VIS