Online object tracking based on CNN with spatial-temporal saliency guided sampling

被引:48
|
作者
Zhang, Peng [1 ]
Zhuo, Tao [2 ]
Huang, Wei [3 ]
Chen, Kangli [1 ]
Kankanhalli, Mohan [4 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian, Peoples R China
[2] Natl Univ Singapore, Sensor Enhanced Social Media SeSaMe Ctr, Singapore, Singapore
[3] Nanchang Univ, Sch Informat Engn, Nanchang, Jiangxi, Peoples R China
[4] Natl Univ Singapore, Sch Comp, Singapore, Singapore
基金
新加坡国家研究基金会;
关键词
Tracking; CNN; Spatial-temporal; Saliency; Sampling; FEATURES;
D O I
10.1016/j.neucom.2016.10.073
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Arbitrary tracking is hard due to nonstop intrinsic and extrinsic variations in realistic scenarios. Even for the popular tracking-by-learning strategies, effective appearance modeling of the non-rigid objects is still challenging because of the targets' articulatory deformations on-the-fly, which may heavily degrade the discriminative capability of the online generated visual features. With widely emerged deep learning showing its success for feature extraction in different recognition tasks, more and more deep models such as CNN have been demonstrated contributive to improving the performance of online tracking. However, only depending on the outputs from last layer of CNN is not an optimum representation since the coarse spatial resolution cannot guarantee an accurate localization for a qualified sampling process, especially when objects have severe deformations, sampling from the region with a pre-defined scale would inevitably guide a poor online learning. To overcome such a limitation of CNN based tracking, in this work, We incorporated spatial-temporal saliency detection to guide a more accurate target localization for qualified sampling within an inter-frame motion flow map. With an optional strategy for the output combifiation of intra-frame appearance correlations and inter-frame motion saliency based on a compositional energy optimization, the proposed tracking has shown a superior performance in comparison to the other state-of-art trackers on both challenging non-rigid and generic tracking benchmark datasets. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:115 / 127
页数:13
相关论文
共 50 条
  • [1] Online Multi-Object Tracking Using CNN-based Single Object Tracker with Spatial-Temporal Attention Mechanism
    Chu, Qi
    Ouyang, Wanli
    Li, Hongsheng
    Wang, Xiaogang
    Liu, Bin
    Yu, Nenghai
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4846 - 4855
  • [2] MASK GUIDED SPATIAL-TEMPORAL FUSION NETWORK FOR MULTIPLE OBJECT TRACKING
    Zhao, Shuangye
    Wu, Yubin
    Wang, Shuai
    Ke, Wei
    Sheng, Hao
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 3231 - 3235
  • [3] Dynamic Saliency Detection via CNN and Spatial-temporal Fusion
    Qi, Zhang
    Dong, Xu
    TENTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2018), 2018, 10806
  • [4] A spatial-temporal contexts network for object tracking
    Huang, Kai
    Xiao, Kai
    Chu, Jun
    Leng, Lu
    Dong, Xingbo
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 127
  • [5] Object tracking based on adaptive updating of a spatial-temporal context model
    Feng, Wanli
    Cen, Yigang
    Zeng, Xianyou
    Li, Zhetao
    Zeng, Ming
    Voronin, Viacheslav
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2017, 11 (11): : 5459 - 5473
  • [6] Object Tracking via Spatial-Temporal Memory Network
    Zhou, Zikun
    Li, Xin
    Zhang, Tianzhu
    Wang, Hongpeng
    He, Zhenyu
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (05) : 2976 - 2989
  • [7] Augmenting cascaded correlation filters with spatial-temporal saliency for visual tracking
    Zhao, Dawei
    Xiao, Liang
    Fu, Hao
    Wu, Tao
    Xu, Xin
    Dai, Bin
    INFORMATION SCIENCES, 2019, 470 : 78 - 93
  • [8] Video Captioning Based on the Spatial-Temporal Saliency Tracing
    Zhou, Yuanen
    Hu, Zhenzhen
    Liu, Xueliang
    Wang, Meng
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT I, 2018, 11164 : 59 - 70
  • [9] Contrast Based Hierarchical Spatial-Temporal Saliency for Video
    Le, Trung-Nghia
    Sugimoto, Akihiro
    IMAGE AND VIDEO TECHNOLOGY, PSIVT 2015, 2016, 9431 : 734 - 748
  • [10] Online Scene Text Tracking with Spatial-Temporal Relation
    Xiu, Yan
    Zhou, Hong-Yang
    Tian, Shu
    Yin, Xu-Cheng
    IMAGE AND GRAPHICS (ICIG 2021), PT III, 2021, 12890 : 610 - 622