Transformer Tracking

被引:1112
作者
Chen, Xin [1 ]
Yan, Bin [1 ]
Zhu, Jiawen [1 ]
Wang, Dong [1 ]
Yang, Xiaoyun [3 ]
Lu, Huchuan [1 ,2 ]
机构
[1] Dalian Univ Technol, Sch Informat & Commun Engn, Dalian, Peoples R China
[2] Peng Cheng Lab, Shenzhen, Guangdong, Peoples R China
[3] Remark AI, Las Vegas, NV USA
来源
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR46437.2021.00803
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Correlation acts as a critical role in the tracking field, especially in recent popular Siamese-based trackers. The correlation operation is a simple fusion manner to consider the similarity between the template and the search region. However, the correlation operation itself is a local linear matching process, leading to lose semantic information and fall into local optimum easily, which may be the bottleneck of designing high-accuracy tracking algorithms. Is there any better feature fusion method than correlation? To address this issue, inspired by Transformer, this work presents a novel attention-based feature fission network, which effectively combines the template and search region features solely using attention. Specifically, the proposed method includes an ego-context augment module based on self-attention and a cross-feature augment module based on cross-attention. Finally, we present a Transformer tracking (named TransT) method based on the Siamese-like feature extraction backbone, the designed attention-based fusion mechanism, and the classification and regression head. Experiments show that our TransT achieves very promising results on six challenging datasets, especially on large-scale LaSOT, Tracking Net, and GOT-10k benchmarks. Our tracker runs at approximatively 50 fps on GPU.
引用
收藏
页码:8122 / 8131
页数:10
相关论文
共 50 条
[1]  
[Anonymous], 2018, ECCV, DOI DOI 10.1007/978-3-030-01246-5_19
[2]  
[Anonymous], 2017, CVPR, DOI DOI 10.1109/CVPR.2017.513
[3]  
[Anonymous], 2018, ICML
[4]  
[Anonymous], 2020, CVPR, DOI DOI 10.1109/CVPR42600.2020.00632
[5]  
[Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.00432
[6]  
Bertinetto Luca, 2016, ECCVW, V1
[7]   Know Your Surroundings: Exploiting Scene Information for Object Tracking [J].
Bhat, Goutam ;
Danelljan, Martin ;
Van Gool, Luc ;
Timofte, Radu .
COMPUTER VISION - ECCV 2020, PT XXIII, 2020, 12368 :205-221
[8]  
Carion Nicolas, 2020, EUROPEAN C COMPUTER
[9]   A Neural Rendering Framework for Free-Viewpoint Relighting [J].
Chen, Zhang ;
Chen, Anpei ;
Zhang, Guli ;
Wang, Chengyuan ;
Ji, Yu ;
Kutulakos, Kiriakos N. ;
Yu, Jingyi .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :5598-5609
[10]   Impact of Sea Surface Temperature and Surface Air Temperature on Maximizing Typhoon Rainfall: Focusing on Typhoon Maemi in Korea [J].
Choi, Jeonghyeon ;
Lee, Jeonghoon ;
Kim, Sangdan .
ADVANCES IN METEOROLOGY, 2019, 2019