SiamRPN plus plus : Evolution of Siamese Visual Tracking with Very Deep Networks

被引:1613
作者
Li, Bo [1 ]
Wu, Wei [1 ]
Wang, Qiang [2 ]
Zhang, Fangyi [3 ]
Xing, Junliang [2 ]
Yan, Junjie [1 ]
机构
[1] SenseTime Res, Hong Kong, Peoples R China
[2] CASIA, NLPR, Beijing, Peoples R China
[3] ICT, VIPL, Beijing, Peoples R China
来源
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年
关键词
D O I
10.1109/CVPR.2019.00441
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Siamese network based trackers formulate tracking as convolutional feature cross-correlation between a target template and a search region. However, Siamese trackers still have an accuracy gap compared with state-of-the-art algorithms and they cannot take advantage of features from deep networks, such as ResNet-50 or deeper. In this work we prove the core reason comes from the lack of strict translation invariance. By comprehensive theoretical analysis and experimental validations, we break this restriction through a simple yet effective spatial aware sampling strategy and successfully train a ResNet-driven Siamese tracker with significant performance gain. Moreover, we propose a new model architecture to perform layer-wise and depthwise aggregations, which not only further improves the accuracy but also reduces the model size. We conduct extensive ablation studies to demonstrate the effectiveness of the proposed tracker, which obtains currently the best results on five large tracking benchmarks, including OTB2015, VOT2018, UAV123, LaSOT, and TrackingNet.
引用
收藏
页码:4277 / 4286
页数:10
相关论文
共 53 条
  • [31] [Anonymous], 2017, Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications
  • [32] [Anonymous], 2018, IJCAI
  • [33] [Anonymous], 2015, INT J COMPUT VISION
  • [34] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
    Chen, Liang-Chieh
    Zhu, Yukun
    Papandreou, George
    Schroff, Florian
    Adam, Hartwig
    [J]. COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 833 - 851
  • [35] Danelljan M., 2015, ICCV WORKSH
  • [36] Danelljan M., 2015, ICCV
  • [37] Danelljan M., 2014, CVPR
  • [38] He Kaiming, 2017, IEEE INT C COMP VIS
  • [39] Hong Z., 2015, CVPR
  • [40] The Sixth Visual Object Tracking VOT2018 Challenge Results
    Kristan, Matej
    Leonardis, Ales
    Matas, Jiri
    Felsberg, Michael
    Pflugfelder, Roman
    Zajc, Luka Cehovin
    Vojir, Tomas
    Bhat, Goutam
    Lukezic, Alan
    Eldesokey, Abdelrahman
    Fernandez, Gustavo
    Garcia-Martin, Alvaro
    Iglesias-Arias, Alvaro
    Alatan, A. Aydin
    Gonzalez-Garcia, Abel
    Petrosino, Alfredo
    Memarmoghadam, Alireza
    Vedaldi, Andrea
    Muhic, Andrej
    He, Anfeng
    Smeulders, Arnold
    Perera, Asanka G.
    Li, Bo
    Chen, Boyu
    Kim, Changick
    Xu, Changsheng
    Xiong, Changzhen
    Tian, Cheng
    Luo, Chong
    Sun, Chong
    Hao, Cong
    Kim, Daijin
    Mishra, Deepak
    Chen, Deming
    Wang, Dong
    Wee, Dongyoon
    Gavves, Efstratios
    Gundogdu, Erhan
    Velasco-Salido, Erik
    Khan, Fahad Shahbaz
    Yang, Fan
    Zhao, Fei
    Li, Feng
    Battistone, Francesco
    De Ath, George
    Subrahmanyam, Gorthi R. K. S.
    Bastos, Guilherme
    Ling, Haibin
    Galoogahi, Hamed Kiani
    Lee, Hankyeol
    [J]. COMPUTER VISION - ECCV 2018 WORKSHOPS, PT I, 2019, 11129 : 3 - 53