Transformer Tracking

被引:904
作者
Chen, Xin [1 ]
Yan, Bin [1 ]
Zhu, Jiawen [1 ]
Wang, Dong [1 ]
Yang, Xiaoyun [3 ]
Lu, Huchuan [1 ,2 ]
机构
[1] Dalian Univ Technol, Sch Informat & Commun Engn, Dalian, Peoples R China
[2] Peng Cheng Lab, Shenzhen, Guangdong, Peoples R China
[3] Remark AI, Las Vegas, NV USA
来源
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR46437.2021.00803
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Correlation acts as a critical role in the tracking field, especially in recent popular Siamese-based trackers. The correlation operation is a simple fusion manner to consider the similarity between the template and the search region. However, the correlation operation itself is a local linear matching process, leading to lose semantic information and fall into local optimum easily, which may be the bottleneck of designing high-accuracy tracking algorithms. Is there any better feature fusion method than correlation? To address this issue, inspired by Transformer, this work presents a novel attention-based feature fission network, which effectively combines the template and search region features solely using attention. Specifically, the proposed method includes an ego-context augment module based on self-attention and a cross-feature augment module based on cross-attention. Finally, we present a Transformer tracking (named TransT) method based on the Siamese-like feature extraction backbone, the designed attention-based fusion mechanism, and the classification and regression head. Experiments show that our TransT achieves very promising results on six challenging datasets, especially on large-scale LaSOT, Tracking Net, and GOT-10k benchmarks. Our tracker runs at approximatively 50 fps on GPU.
引用
收藏
页码:8122 / 8131
页数:10
相关论文
共 50 条
  • [1] [Anonymous], 2018, ECCV, DOI DOI 10.1007/978-3-030-01246-5_19
  • [2] [Anonymous], 2018, ICML
  • [3] [Anonymous], 2020, CVPR, DOI DOI 10.1109/CVPR42600.2020.00632
  • [4] [Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.00432
  • [5] Bertinetto Luca, 2016, ECCVW, V1
  • [6] Bhat Goutam, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12368), P205, DOI 10.1007/978-3-030-58592-1_13
  • [7] Carion N., 2020, ARXIV200512872
  • [8] A Neural Rendering Framework for Free-Viewpoint Relighting
    Chen, Zhang
    Chen, Anpei
    Zhang, Guli
    Wang, Chengyuan
    Ji, Yu
    Kutulakos, Kiriakos N.
    Yu, Jingyi
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 5598 - 5609
  • [9] CHOI J, 2017, CVPR, P4828, DOI DOI 10.1109/CVPR.2017.513
  • [10] Impact of Sea Surface Temperature and Surface Air Temperature on Maximizing Typhoon Rainfall: Focusing on Typhoon Maemi in Korea
    Choi, Jeonghyeon
    Lee, Jeonghoon
    Kim, Sangdan
    [J]. ADVANCES IN METEOROLOGY, 2019, 2019