Online object tracking based on CNN with spatial-temporal saliency guided sampling

被引:48
作者
Zhang, Peng [1 ]
Zhuo, Tao [2 ]
Huang, Wei [3 ]
Chen, Kangli [1 ]
Kankanhalli, Mohan [4 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian, Peoples R China
[2] Natl Univ Singapore, Sensor Enhanced Social Media SeSaMe Ctr, Singapore, Singapore
[3] Nanchang Univ, Sch Informat Engn, Nanchang, Jiangxi, Peoples R China
[4] Natl Univ Singapore, Sch Comp, Singapore, Singapore
基金
新加坡国家研究基金会;
关键词
Tracking; CNN; Spatial-temporal; Saliency; Sampling; FEATURES;
D O I
10.1016/j.neucom.2016.10.073
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Arbitrary tracking is hard due to nonstop intrinsic and extrinsic variations in realistic scenarios. Even for the popular tracking-by-learning strategies, effective appearance modeling of the non-rigid objects is still challenging because of the targets' articulatory deformations on-the-fly, which may heavily degrade the discriminative capability of the online generated visual features. With widely emerged deep learning showing its success for feature extraction in different recognition tasks, more and more deep models such as CNN have been demonstrated contributive to improving the performance of online tracking. However, only depending on the outputs from last layer of CNN is not an optimum representation since the coarse spatial resolution cannot guarantee an accurate localization for a qualified sampling process, especially when objects have severe deformations, sampling from the region with a pre-defined scale would inevitably guide a poor online learning. To overcome such a limitation of CNN based tracking, in this work, We incorporated spatial-temporal saliency detection to guide a more accurate target localization for qualified sampling within an inter-frame motion flow map. With an optional strategy for the output combifiation of intra-frame appearance correlations and inter-frame motion saliency based on a compositional energy optimization, the proposed tracking has shown a superior performance in comparison to the other state-of-art trackers on both challenging non-rigid and generic tracking benchmark datasets. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:115 / 127
页数:13
相关论文
共 57 条
  • [11] Gu S., 2011, P INT C COMP VIS
  • [12] GUO Y, 2016, NEUROCOMPUTING
  • [13] Hare S, 2011, IEEE I CONF COMP VIS, P263, DOI 10.1109/ICCV.2011.6126251
  • [14] Henriques F., 2012, PROCEEDINGS OF THE E
  • [15] High-Speed Tracking with Kernelized Correlation Filters
    Henriques, Joao F.
    Caseiro, Rui
    Martins, Pedro
    Batista, Jorge
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (03) : 583 - 596
  • [16] Reducing the dimensionality of data with neural networks
    Hinton, G. E.
    Salakhutdinov, R. R.
    [J]. SCIENCE, 2006, 313 (5786) : 504 - 507
  • [17] Realtime and robust object matching with a large number of templates
    Hong, Chaoqun
    Zhu, Jianke
    Yu, Jun
    Cheng, Jun
    Chen, Xuhui
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (03) : 1459 - 1480
  • [18] Multimodal Deep Autoencoder for Human Pose Recovery
    Hong, Chaoqun
    Yu, Jun
    Wan, Jian
    Tao, Dacheng
    Wang, Meng
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (12) : 5659 - 5670
  • [19] Hong S., 2015, PROCEEDINGS OF THE I
  • [20] MUlti-Store Tracker (MUSTer): a Cognitive Psychology Inspired Approach to Object Tracking
    Hong, Zhibin
    Chen, Zhe
    Wang, Chaohui
    Mei, Xue
    Prokhorov, Danil
    Tao, Dacheng
    [J]. 2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 749 - 758