Dual Feature Fusion Tracking With Combined Cross-Correlation and Transformer

被引:1
作者
Che, Chao [1 ]
Fu, Yanyun [2 ]
Shi, Wenxi [3 ,4 ]
Zhu, Zhansheng [3 ,4 ]
Wang, Deyong [3 ,4 ]
机构
[1] Xinjiang Univ, Coll Informat Sci & Engn, Urumqi 830017, Peoples R China
[2] Beijing Acad Sci & Technol, Beijing 100035, Peoples R China
[3] Key Lab Big Data Xinjiang Social Secur Risk, Urumqi 830011, Peoples R China
[4] Xinjiang Lianhaichuangzhi Informat Technol Co Ltd, Urumqi 830011, Peoples R China
关键词
Transformers; Target tracking; Correlation; Object tracking; Kernel; Computer vision; Visualization; cross-correlation; transformer; local matching; global dependency; VISUAL TRACKING;
D O I
10.1109/ACCESS.2023.3346044
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Siamese networks have found applications in various fields, notably object tracking, due to their remarkable speed and accuracy. Siamese tracking networks rely on cross-correlation to obtain the similarity score between the target template and the search region. However, since cross-correlation is a local matching operation, it cannot effectively capture the global context information. While the Transformer for feature fusion can better capture long-range dependencies and obtain more semantic information, more localized edge information is needed to distinguish the target from the background. Cross-correlation fusion and Transformer fusion have their advantages. They can complement each other, so we combine them and propose a dual feature fusion tracker (SiamCT) to obtain the local correlations and global dependencies between the target and the search region. Specifically, we construct two parallel feature fusion paths based on cross-correlation and Transformer. Among them, for cross-correlation fusion, we adopt the more efficient two-dimension pixel-wise cross-correlation (TDPC), which performs correlation operations from both spatial and channel dimensions, and the interaction of multidimensional information helps to realize more accurate feature fusion. Subsequently, the fused features are augmented by coordinate attention (CA) for orientation-dependent positional information. For Transformer fusion, we introduce cos-based linear attention(ClA) to improve Transformer's ability to acquire global context information. Our SiamCT outperforms existing leading methods in GOT-10k, LaSOT, TrackingNet, and OTB100 benchmarks based on extensive experiments. In particular, the AO score on the GOT-10k benchmark is 70.6%, and the SR0.5 and SR0.75 scores are 80.5%, 65.9%, respectively, achieving state-of-the-art performance.
引用
收藏
页码:144966 / 144977
页数:12
相关论文
共 49 条
[1]   Fully-Convolutional Siamese Networks for Object Tracking [J].
Bertinetto, Luca ;
Valmadre, Jack ;
Henriques, Joao F. ;
Vedaldi, Andrea ;
Torr, Philip H. S. .
COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 :850-865
[2]   HiFT: Hierarchical Feature Transformer for Aerial Tracking [J].
Cao, Ziang ;
Fu, Changhong ;
Ye, Junjie ;
Li, Bowen ;
Li, Yiming .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :15437-15446
[3]   Transformer Tracking [J].
Chen, Xin ;
Yan, Bin ;
Zhu, Jiawen ;
Wang, Dong ;
Yang, Xiaoyun ;
Lu, Huchuan .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :8122-8131
[4]   MixFormer: End-to-End Tracking with Iterative Mixed Attention [J].
Cui, Yutao ;
Jiang, Cheng ;
Wang, Limin ;
Wu, Gangshan .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :13598-13608
[5]   LaSOT: A High-quality Benchmark for Large-scale Single Object Tracking [J].
Fan, Heng ;
Lin, Liting ;
Yang, Fan ;
Chu, Peng ;
Deng, Ge ;
Yu, Sijia ;
Bai, Hexin ;
Xu, Yong ;
Liao, Chunyuan ;
Ling, Haibin .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :5369-5378
[6]  
Fu ZH, 2022, Arxiv, DOI arXiv:2205.03776
[7]   Siamese attentional keypoint network for high performance visual tracking [J].
Gao, Peng ;
Yuan, Ruyue ;
Wang, Fei ;
Xiao, Liyi ;
Fujita, Hamido ;
Zhang, Yan .
KNOWLEDGE-BASED SYSTEMS, 2020, 193
[8]   Learning reinforced attentional representation for end-to-end visual tracking [J].
Gao, Peng ;
Zhang, Qiquan ;
Wang, Fei ;
Xiao, Liyi ;
Fujita, Hamido ;
Zhang, Yan .
INFORMATION SCIENCES, 2020, 517 :52-67
[9]   High performance visual tracking with circular and structural operators [J].
Gao, Peng ;
Ma, Yipeng ;
Song, Ke ;
Li, Chao ;
Wang, Fei ;
Xiao, Liyi ;
Zhang, Yan .
KNOWLEDGE-BASED SYSTEMS, 2018, 161 :240-253
[10]   Graph Attention Tracking [J].
Guo, Dongyan ;
Shao, Yanyan ;
Cui, Ying ;
Wang, Zhenhua ;
Zhang, Liyan ;
Shen, Chunhua .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :9538-9547