Online object tracking based on CNN with spatial-temporal saliency guided sampling

被引:48
作者
Zhang, Peng [1 ]
Zhuo, Tao [2 ]
Huang, Wei [3 ]
Chen, Kangli [1 ]
Kankanhalli, Mohan [4 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian, Peoples R China
[2] Natl Univ Singapore, Sensor Enhanced Social Media SeSaMe Ctr, Singapore, Singapore
[3] Nanchang Univ, Sch Informat Engn, Nanchang, Jiangxi, Peoples R China
[4] Natl Univ Singapore, Sch Comp, Singapore, Singapore
基金
新加坡国家研究基金会;
关键词
Tracking; CNN; Spatial-temporal; Saliency; Sampling; FEATURES;
D O I
10.1016/j.neucom.2016.10.073
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Arbitrary tracking is hard due to nonstop intrinsic and extrinsic variations in realistic scenarios. Even for the popular tracking-by-learning strategies, effective appearance modeling of the non-rigid objects is still challenging because of the targets' articulatory deformations on-the-fly, which may heavily degrade the discriminative capability of the online generated visual features. With widely emerged deep learning showing its success for feature extraction in different recognition tasks, more and more deep models such as CNN have been demonstrated contributive to improving the performance of online tracking. However, only depending on the outputs from last layer of CNN is not an optimum representation since the coarse spatial resolution cannot guarantee an accurate localization for a qualified sampling process, especially when objects have severe deformations, sampling from the region with a pre-defined scale would inevitably guide a poor online learning. To overcome such a limitation of CNN based tracking, in this work, We incorporated spatial-temporal saliency detection to guide a more accurate target localization for qualified sampling within an inter-frame motion flow map. With an optional strategy for the output combifiation of intra-frame appearance correlations and inter-frame motion saliency based on a compositional energy optimization, the proposed tracking has shown a superior performance in comparison to the other state-of-art trackers on both challenging non-rigid and generic tracking benchmark datasets. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:115 / 127
页数:13
相关论文
共 57 条
  • [1] [Anonymous], 2006, TOEPLITZ CIRCULANT M
  • [2] [Anonymous], 2011, P IEEE INT C COMP VI
  • [3] [Anonymous], 2009, Ph.D. dissertation
  • [4] [Anonymous], 2014, P AS C COMP VIS
  • [5] Aytekin C., 2015, P IEEE INT C PATT RE
  • [6] BABENKO B, 2009, P IEEE INT C COMP VI
  • [7] Boykov Y. Y., 2015, P IEEE INT C COMP VI
  • [8] Global Contrast based Salient Region Detection
    Cheng, Ming-Ming
    Zhang, Guo-Xin
    Mitra, Niloy J.
    Huang, Xiaolei
    Hu, Shi-Min
    [J]. 2011 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2011, : 409 - 416
  • [9] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
  • [10] Dinh T. B., 2011, P IEEE INT C COMP VI