Spatial feature embedding for robust visual object tracking

被引:2
作者
Liu, Kang [1 ]
Liu, Long [1 ]
Yang, Shangqi [1 ]
Fu, Zhihao [1 ]
机构
[1] Xian Univ Technol, Sch Automat & Informat Engn, Xian, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
computer vision; distance learning; image motion analysis; object tracking; NETWORKS;
D O I
10.1049/cvi2.12263
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, the offline-trained Siamese pipeline has drawn wide attention due to its outstanding tracking performance. However, the existing Siamese trackers utilise offline training to extract 'universal' features, which is insufficient to effectively distinguish between the target and fluctuating interference in embedding the information of the two branches, leading to inaccurate classification and localisation. In addition, the Siamese trackers employ a pre-defined scale for cropping the search candidate region based on the previous frame's result, which might easily introduce redundant background noise (clutter, similar objects etc.), affecting the tracker's robustness. To solve these problems, the authors propose two novel sub-network spatial employed to spatial feature embedding for robust object tracking. Specifically, the proposed spatial remapping (SRM) network enhances the feature discrepancy between target and distractor categories by online remapping, and improves the discriminant ability of the tracker on the embedding space. The MAML is used to optimise the SRM network to ensure its adaptability to complex tracking scenarios. Moreover, a temporal information proposal-guided (TPG) network that utilises a GRU model to dynamically predict the search scale based on temporal motion states to reduce potential background interference is introduced. The proposed two network is integrated into two popular trackers, namely SiamFC++ and TransT, which achieve superior performance on six challenging benchmarks, including OTB100, VOT2019, UAV123, GOT10K, TrackingNet and LaSOT, TrackingNet and LaSOT denoting them as SiamSRMC and SiamSRMT, respectively. Moreover, the proposed trackers obtain competitive tracking performance compared with the state-of-the-art trackers in the attribute of background clutter and similar object, validating the effectiveness of our method. The anchor-free Siamese tracking method is prone to 'target-like' classification responses in areas with backgrounds clutter and similar distractors, affecting tracking accuracy. The authors propose a spatial remapping network that provide more discriminative metric features for accurate classification and localisation of similar objects, enhancing the tracker's ability to handle distractor regions.image
引用
收藏
页码:540 / 556
页数:17
相关论文
共 62 条
[1]   Fully-Convolutional Siamese Networks for Object Tracking [J].
Bertinetto, Luca ;
Valmadre, Jack ;
Henriques, Joao F. ;
Vedaldi, Andrea ;
Torr, Philip H. S. .
COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 :850-865
[2]  
Bhat Goutam, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12368), P205, DOI 10.1007/978-3-030-58592-1_13
[3]   Learning Discriminative Model Prediction for Tracking [J].
Bhat, Goutam ;
Danelljan, Martin ;
Van Gool, Luc ;
Timofte, Radu .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6181-6190
[4]  
Bingyan Liao, 2020, Computer Vision - ECCV 2020 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12367), P429, DOI 10.1007/978-3-030-58542-6_26
[5]   HiFT: Hierarchical Feature Transformer for Aerial Tracking [J].
Cao, Ziang ;
Fu, Changhong ;
Ye, Junjie ;
Li, Bowen ;
Li, Yiming .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :15437-15446
[6]  
Chen B., 2022, BACKBONE IS ALL YOUR, P375
[7]  
Chen Xin, 2023, Computer Vision - ECCV 2022 Workshops: Proceedings. Lecture Notes in Computer Science (13808), P461, DOI 10.1007/978-3-031-25085-9_26
[8]   Transformer Tracking [J].
Chen, Xin ;
Yan, Bin ;
Zhu, Jiawen ;
Wang, Dong ;
Yang, Xiaoyun ;
Lu, Huchuan .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :8122-8131
[9]   Siamese Box Adaptive Network for Visual Tracking [J].
Chen, Zedu ;
Zhong, Bineng ;
Li, Guorong ;
Zhang, Shengping ;
Ji, Rongrong .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :6667-6676
[10]   Deep Meta Learning for Real-Time Target-Aware Visual Tracking [J].
Choi, Janghoon ;
Kwon, Junseok ;
Lee, Kyoung Mu .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :911-920