Deeper Siamese Network With Stronger Feature Representation for Visual Tracking

被引:4
作者
Zhang, Chaoyi [1 ]
Wang, Howard [2 ]
Wen, Jiwei [1 ]
Peng, Li [1 ]
机构
[1] Jiangnan Univ, Sch Internet Things Engn, Key Lab Adv Proc Control Light Ind, Minist Educ, Wuxi 214122, Jiangsu, Peoples R China
[2] Univ Auckland, Dept Elect Comp & Software Engn, Auckland 1010, New Zealand
基金
中国国家自然科学基金;
关键词
Visual tracking; Siamese network; channel attention mechanism;
D O I
10.1109/ACCESS.2020.3005511
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Siamese network based visual tracking has drawn considerable attention recently due to the balanced accuracy and speed. This type of method mostly trains a relatively shallow twin network offline, and measures the similarity online using cross-correlation operation between the feature maps generated by the last convolutional layer of the target and search regions to locate the object. Nevertheless, a single feature map extracted from the last layer of shallow networks is insufficient to describe target appearance, as well as sensitive to the distractors, which could mislead the similarity response map and make the tracker easily drift. To enhance the tracking accuracy and robustness while maintaining the real-time speed, based on the above tracking paradigm, three improvements including reform of backbone network, fusion of hierarchical features and utilization of channel attention mechanism, have been made in this paper. Firstly, we introduce a modified deeper VGG16 backbone network, which could extract more powerful features contributing to distinguishing the target from distractors. Secondly, we fuse diverse features extracted from deep layers and shallow layers to take advantage of both semantic and spatial information of the target. Thirdly, we incorporate a novel lightweight residual channel attention mechanism into the backbone network, which expands the weight gap between different channels and helps the network pay more attention on dominant features. Extensive experimental results on OTB100 and VOT2018 benchmarks demonstrate that our tracker performs better in accuracy and efficiency against several state-of-the-art methods in real-time scenarios.
引用
收藏
页码:119094 / 119104
页数:11
相关论文
共 47 条
[1]   Staple: Complementary Learners for Real-Time Tracking [J].
Bertinetto, Luca ;
Valmadre, Jack ;
Golodetz, Stuart ;
Miksik, Ondrej ;
Torr, Philip H. S. .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1401-1409
[2]   Unveiling the Power of Deep Tracking [J].
Bhat, Goutam ;
Johnander, Joakim ;
Danelljan, Martin ;
Khan, Fahad Shahbaz ;
Felsberg, Michael .
COMPUTER VISION - ECCV 2018, PT II, 2018, 11206 :493-509
[3]  
Cen MB, 2018, IEEE IMAGE PROC, P3718, DOI 10.1109/ICIP.2018.8451102
[4]   DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving [J].
Chen, Chenyi ;
Seff, Ari ;
Kornhauser, Alain ;
Xiao, Jianxiong .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2722-2730
[5]   Once for All: A Two-Flow Convolutional Neural Network for Visual Tracking [J].
Chen, Kai ;
Tao, Wenbing .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2018, 28 (12) :3377-3386
[6]   ECO: Efficient Convolution Operators for Tracking [J].
Danelljan, Martin ;
Bhat, Goutam ;
Khan, Fahad Shahbaz ;
Felsberg, Michael .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6931-6939
[7]   Adaptive Color Attributes for Real-Time Visual Tracking [J].
Danelljan, Martin ;
Khan, Fahad Shahbaz ;
Felsberg, Michael ;
van de Weijer, Joost .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :1090-1097
[8]  
Dong Z, 2018, INT CONF POW ELECTR, P459, DOI 10.23919/IPEC.2018.8507611
[9]   RPAN: An End-to-End Recurrent Pose-Attention Network for Action Recognition in Videos [J].
Du, Wenbin ;
Wang, Yali ;
Qiao, Yu .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :3745-3754
[10]   Siamese Cascaded Region Proposal Networks for Real-Time Visual Tracking [J].
Fan, Heng ;
Ling, Haibin .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :7944-7953