Visual tracking using spatio-temporally nonlocally regularized correlation filter

被引:40
作者
Zhang, Kaihua [1 ]
Li, Xuejun [1 ]
Song, Huihui [1 ]
Liu, Qingshan [1 ]
Lian, Wei [2 ]
机构
[1] Nanjing Univ Informat Sci & Technol, Jiangsu Key Lab Big Data Anal Technol, Nanjing, Jiangsu, Peoples R China
[2] Changzhi Univ, Dept Comp Sci, Changzhi, Shanxi, Peoples R China
关键词
Visual tracking; Video segmentation; Nonlocal appearance learning; Graphical model; Optical flow; OBJECT TRACKING; IMAGE SEGMENTATION;
D O I
10.1016/j.patcog.2018.05.017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to the factors like rapidly fast motion, cluttered backgrounds, arbitrary object appearance variation and shape deformation, an effective target representation plays a key role in robust visual tracking. Existing methods often employ bounding boxes for target representations, which are easily polluted by noisy clutter backgrounds that may cause drifting problem when the target undergoes large-scale non-rigid or articulated motions. To address this issue, in this paper, motivated by the spatio-temporal nonlocality of target appearance reoccurrence in a video, we explore the nonlocal information to accurately represent and segment the target, yielding an object likelihood map to regularize a correlation filter (CF) for visual tracking. Specifically, given a set of tracked target bounding boxes, we first generate a set of superpixels to represent the foreground and background, and then update the appearance of each superpixel with its long-term spatio-temporally nonlocal counterparts. Then, with the updated appearances, we formulate a spatio-temporally graphical model comprised of the superpixel label consistency potentials. Afterwards, we generate segmentation by optimizing the graphical model via iteratively updating the appearance model and estimating the labels. Finally, with the segmentation mask, we obtain an object likelihood map that is employed to adaptively regularize the CF learning by suppressing the clutter background noises while making full use of the long-term stable target appearance information. Extensive evaluations on the OTB50, SegTrack, Youtube-Objects datasets demonstrate the effectiveness of the proposed method, which performs favorably against some state-of-art methods. (C) 2018 Elsevier Ltd. All rights reserved.
引用
收藏
页码:185 / 195
页数:11
相关论文
共 61 条
[1]   SLIC Superpixels Compared to State-of-the-Art Superpixel Methods [J].
Achanta, Radhakrishna ;
Shaji, Appu ;
Smith, Kevin ;
Lucchi, Aurelien ;
Fua, Pascal ;
Suesstrunk, Sabine .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (11) :2274-2281
[2]  
[Anonymous], 2010, P 18 ACM INT C MULT, DOI [10.1145/1873951.1874249, 10.1145/1873951.1874249.2]
[3]  
[Anonymous], FOUND TRENDS MACH LE
[4]  
[Anonymous], 2017, P IEEE C COMP VIS PA
[5]   Video SnapCut: Robust Video Object Cutout Using Localized Classifiers [J].
Bai, Xue ;
Wang, Jue ;
Simons, David ;
Sapiro, Guillermo .
ACM TRANSACTIONS ON GRAPHICS, 2009, 28 (03)
[6]   Staple: Complementary Learners for Real-Time Tracking [J].
Bertinetto, Luca ;
Valmadre, Jack ;
Golodetz, Stuart ;
Miksik, Ondrej ;
Torr, Philip H. S. .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1401-1409
[7]   Fully-Convolutional Siamese Networks for Object Tracking [J].
Bertinetto, Luca ;
Valmadre, Jack ;
Henriques, Joao F. ;
Vedaldi, Andrea ;
Torr, Philip H. S. .
COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 :850-865
[8]  
Bolme DS, 2010, PROC CVPR IEEE, P2544, DOI 10.1109/CVPR.2010.5539960
[9]   Video Object Segmentation by Tracking Regions [J].
Brendel, William ;
Todorovic, Sinisa .
2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2009, :833-840
[10]  
Brox T, 2010, LECT NOTES COMPUT SC, V6315, P282, DOI 10.1007/978-3-642-15555-0_21