SiamRPN plus plus : Evolution of Siamese Visual Tracking with Very Deep Networks

被引:1614
作者
Li, Bo [1 ]
Wu, Wei [1 ]
Wang, Qiang [2 ]
Zhang, Fangyi [3 ]
Xing, Junliang [2 ]
Yan, Junjie [1 ]
机构
[1] SenseTime Res, Hong Kong, Peoples R China
[2] CASIA, NLPR, Beijing, Peoples R China
[3] ICT, VIPL, Beijing, Peoples R China
来源
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年
关键词
D O I
10.1109/CVPR.2019.00441
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Siamese network based trackers formulate tracking as convolutional feature cross-correlation between a target template and a search region. However, Siamese trackers still have an accuracy gap compared with state-of-the-art algorithms and they cannot take advantage of features from deep networks, such as ResNet-50 or deeper. In this work we prove the core reason comes from the lack of strict translation invariance. By comprehensive theoretical analysis and experimental validations, we break this restriction through a simple yet effective spatial aware sampling strategy and successfully train a ResNet-driven Siamese tracker with significant performance gain. Moreover, we propose a new model architecture to perform layer-wise and depthwise aggregations, which not only further improves the accuracy but also reduces the model size. We conduct extensive ablation studies to demonstrate the effectiveness of the proposed tracker, which obtains currently the best results on five large tracking benchmarks, including OTB2015, VOT2018, UAV123, LaSOT, and TrackingNet.
引用
收藏
页码:4277 / 4286
页数:10
相关论文
共 53 条
  • [41] ImageNet Classification with Deep Convolutional Neural Networks
    Krizhevsky, Alex
    Sutskever, Ilya
    Hinton, Geoffrey E.
    [J]. COMMUNICATIONS OF THE ACM, 2017, 60 (06) : 84 - 90
  • [42] Li B., 2018, CVPR
  • [43] Microsoft COCO: Common Objects in Context
    Lin, Tsung-Yi
    Maire, Michael
    Belongie, Serge
    Hays, James
    Perona, Pietro
    Ramanan, Deva
    Dollar, Piotr
    Zitnick, C. Lawrence
    [J]. COMPUTER VISION - ECCV 2014, PT V, 2014, 8693 : 740 - 755
  • [44] Long J., 2015, CVPR
  • [45] Lukezic A., 2017, CVPR
  • [46] A Benchmark and Simulator for UAV Tracking
    Mueller, Matthias
    Smith, Neil
    Ghanem, Bernard
    [J]. COMPUTER VISION - ECCV 2016, PT I, 2016, 9905 : 445 - 461
  • [47] Pflugfelder R, 2017, ARXIV170700569
  • [48] YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video
    Real, Esteban
    Shlens, Jonathon
    Mazzocchi, Stefano
    Pan, Xin
    Vanhoucke, Vincent
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 7464 - 7473
  • [49] Simonyan K., 2014, P 3 INT C LEARN REPR
  • [50] Szegedy C., 2014, P 2015 IEEE C COMP V, P1, DOI [DOI 10.1109/CVPR.2015.7298594, 10.1109/CVPR.2015.7298594]