Transformer Tracking

被引：1112

作者：

Chen, Xin ^{[1
]}

Yan, Bin ^{[1
]}

Zhu, Jiawen ^{[1
]}

Wang, Dong ^{[1
]}

Yang, Xiaoyun ^{[3
]}

Lu, Huchuan ^{[1
,2
]}

机构：

[1] Dalian Univ Technol, Sch Informat & Commun Engn, Dalian, Peoples R China

[2] Peng Cheng Lab, Shenzhen, Guangdong, Peoples R China

[3] Remark AI, Las Vegas, NV USA

来源：

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/CVPR46437.2021.00803

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Correlation acts as a critical role in the tracking field, especially in recent popular Siamese-based trackers. The correlation operation is a simple fusion manner to consider the similarity between the template and the search region. However, the correlation operation itself is a local linear matching process, leading to lose semantic information and fall into local optimum easily, which may be the bottleneck of designing high-accuracy tracking algorithms. Is there any better feature fusion method than correlation? To address this issue, inspired by Transformer, this work presents a novel attention-based feature fission network, which effectively combines the template and search region features solely using attention. Specifically, the proposed method includes an ego-context augment module based on self-attention and a cross-feature augment module based on cross-attention. Finally, we present a Transformer tracking (named TransT) method based on the Siamese-like feature extraction backbone, the designed attention-based fusion mechanism, and the classification and regression head. Experiments show that our TransT achieves very promising results on six challenging datasets, especially on large-scale LaSOT, Tracking Net, and GOT-10k benchmarks. Our tracker runs at approximatively 50 fps on GPU.

引用

页码：8122 / 8131

页数：10

共 50 条

[1]

[Anonymous], 2018, ECCV, DOI DOI 10.1007/978-3-030-01246-5_19

[2]

[Anonymous], 2017, CVPR, DOI DOI 10.1109/CVPR.2017.513

[3]

[Anonymous], 2018, ICML

[4]

[Anonymous], 2020, CVPR, DOI DOI 10.1109/CVPR42600.2020.00632

[5]

[Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.00432

[6]

Bertinetto Luca, 2016, ECCVW, V1

[7] Know Your Surroundings: Exploiting Scene Information for Object Tracking [J].

Bhat, Goutam ;

Danelljan, Martin ;

Van Gool, Luc ;

Timofte, Radu .

COMPUTER VISION - ECCV 2020, PT XXIII, 2020, 12368 :205-221

[8]

Carion Nicolas, 2020, EUROPEAN C COMPUTER

[9] A Neural Rendering Framework for Free-Viewpoint Relighting [J].

Chen, Zhang ;

Chen, Anpei ;

Zhang, Guli ;

Wang, Chengyuan ;

Ji, Yu ;

Kutulakos, Kiriakos N. ;

Yu, Jingyi .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :5598-5609

[10] Impact of Sea Surface Temperature and Surface Air Temperature on Maximizing Typhoon Rainfall: Focusing on Typhoon Maemi in Korea [J].

Choi, Jeonghyeon ;

Lee, Jeonghoon ;

Kim, Sangdan .

ADVANCES IN METEOROLOGY, 2019, 2019

← 1 2 3 4 5 →