AiATrack: Attention in Attention for Transformer Visual Tracking

被引：219

作者：

Gao, Shenyuan ^{[1
]}

Zhou, Chunluan ^{[2
]}

Ma, Chao ^{[3
]}

Wang, Xinggang ^{[1
]}

Yuan, Junsong ^{[4
]}

机构：

[1] Huazhong Univ Sci & Technol, Wuhan, Peoples R China

[2] Wormpex AI Res, Bellevue, WA USA

[3] Shanghai Jiao Tong Univ, Shanghai, Peoples R China

[4] SUNY Buffalo, Buffalo, NY USA

来源：

COMPUTER VISION, ECCV 2022, PT XXII | 2022年 / 13682卷

基金：

国家重点研发计划; 美国国家科学基金会; 中国国家自然科学基金;

关键词：

Visual tracking; Attention mechanism; Vision transformer;

D O I：

10.1007/978-3-031-20047-2_9

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Transformer trackers have achieved impressive advancements recently, where the attention mechanism plays an important role. However, the independent correlation computation in the attention mechanism could result in noisy and ambiguous attention weights, which inhibits further performance improvement. To address this issue, we propose an attention in attention (AiA) module, which enhances appropriate correlations and suppresses erroneous ones by seeking consensus among all correlation vectors. Our AiA module can be readily applied to both self-attention blocks and cross-attention blocks to facilitate feature aggregation and information propagation for visual tracking. Moreover, we propose a streamlined Transformer tracking framework, dubbed AiATrack, by introducing efficient feature reuse and target-background embeddings to make full use of temporal references. Experiments show that our tracker achieves state-of-the-art performance on six tracking benchmarks while running at a real-time speed. Code and models are publicly available at https://github.com/Little-Podi/AiATrack.

引用

页码：146 / 164

页数：19

共 59 条

[1]

Ba J. L., 2016, arXiv, DOI 10.48550/arXiv:1607.06450

[2] Fully-Convolutional Siamese Networks for Object Tracking [J].

Bertinetto, Luca ;

Valmadre, Jack ;

Henriques, Joao F. ;

Vedaldi, Andrea ;

Torr, Philip H. S. .

COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 :850-865

[3] Know Your Surroundings: Exploiting Scene Information for Object Tracking [J].

Bhat, Goutam ;

Danelljan, Martin ;

Van Gool, Luc ;

Timofte, Radu .

COMPUTER VISION - ECCV 2020, PT XXIII, 2020, 12368 :205-221

[4] Learning Discriminative Model Prediction for Tracking [J].

Bhat, Goutam ;

Danelljan, Martin ;

Van Gool, Luc ;

Timofte, Radu .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6181-6190

[5] GMS: Grid-based Motion Statistics for Fast, Ultra-robust Feature Correspondence [J].

Bian, JiaWang ;

Lin, Wen-Yan ;

Matsushita, Yasuyuki ;

Yeung, Sai-Kit ;

Nguyen, Tan-Dat ;

Cheng, Ming-Ming .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2828-2837

[6] HiFT: Hierarchical Feature Transformer for Aerial Tracking [J].

Cao, Ziang ;

Fu, Changhong ;

Ye, Junjie ;

Li, Bowen ;

Li, Yiming .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :15437-15446

[7] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[8] Transformer Tracking [J].

Chen, Xin ;

Yan, Bin ;

Zhu, Jiawen ;

Wang, Dong ;

Yang, Xiaoyun ;

Lu, Huchuan .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :8122-8131

[9]

Cho Seokju, 2021, Advances in Neural Information Processing Systems, V34

[10] High-Performance Long-Term Tracking with Meta-Updater [J].

Dai, Kenan ;

Zhang, Yunhua ;

Wang, Dong ;

Li, Jianhua ;

Lu, Huchuan ;

Yang, Xiaoyun .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :6297-6306

← 1 2 3 4 5 6 →