AiATrack: Attention in Attention for Transformer Visual Tracking

被引:173
作者
Gao, Shenyuan [1 ]
Zhou, Chunluan [2 ]
Ma, Chao [3 ]
Wang, Xinggang [1 ]
Yuan, Junsong [4 ]
机构
[1] Huazhong Univ Sci & Technol, Wuhan, Peoples R China
[2] Wormpex AI Res, Bellevue, WA USA
[3] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[4] SUNY Buffalo, Buffalo, NY USA
来源
COMPUTER VISION, ECCV 2022, PT XXII | 2022年 / 13682卷
基金
中国国家自然科学基金; 美国国家科学基金会; 国家重点研发计划;
关键词
Visual tracking; Attention mechanism; Vision transformer;
D O I
10.1007/978-3-031-20047-2_9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer trackers have achieved impressive advancements recently, where the attention mechanism plays an important role. However, the independent correlation computation in the attention mechanism could result in noisy and ambiguous attention weights, which inhibits further performance improvement. To address this issue, we propose an attention in attention (AiA) module, which enhances appropriate correlations and suppresses erroneous ones by seeking consensus among all correlation vectors. Our AiA module can be readily applied to both self-attention blocks and cross-attention blocks to facilitate feature aggregation and information propagation for visual tracking. Moreover, we propose a streamlined Transformer tracking framework, dubbed AiATrack, by introducing efficient feature reuse and target-background embeddings to make full use of temporal references. Experiments show that our tracker achieves state-of-the-art performance on six tracking benchmarks while running at a real-time speed. Code and models are publicly available at https://github.com/Little-Podi/AiATrack.
引用
收藏
页码:146 / 164
页数:19
相关论文
共 59 条
[1]  
Ba JL., 2016, ARXIV
[2]   Fully-Convolutional Siamese Networks for Object Tracking [J].
Bertinetto, Luca ;
Valmadre, Jack ;
Henriques, Joao F. ;
Vedaldi, Andrea ;
Torr, Philip H. S. .
COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 :850-865
[3]  
Bhat Goutam, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12368), P205, DOI 10.1007/978-3-030-58592-1_13
[4]   Learning Discriminative Model Prediction for Tracking [J].
Bhat, Goutam ;
Danelljan, Martin ;
Van Gool, Luc ;
Timofte, Radu .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6181-6190
[5]   GMS: Grid-based Motion Statistics for Fast, Ultra-robust Feature Correspondence [J].
Bian, JiaWang ;
Lin, Wen-Yan ;
Matsushita, Yasuyuki ;
Yeung, Sai-Kit ;
Nguyen, Tan-Dat ;
Cheng, Ming-Ming .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2828-2837
[6]   HiFT: Hierarchical Feature Transformer for Aerial Tracking [J].
Cao, Ziang ;
Fu, Changhong ;
Ye, Junjie ;
Li, Bowen ;
Li, Yiming .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :15437-15446
[7]  
Carion N., 2020, EUROPEAN C COMPUTER, V12346, P213, DOI 10.1007/978-3-030-58452-8_13
[8]   Transformer Tracking [J].
Chen, Xin ;
Yan, Bin ;
Zhu, Jiawen ;
Wang, Dong ;
Yang, Xiaoyun ;
Lu, Huchuan .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :8122-8131
[9]  
Cho S., 2021, ADV NEURAL INF PROCE, V34, P9011
[10]   High-Performance Long-Term Tracking with Meta-Updater [J].
Dai, Kenan ;
Zhang, Yunhua ;
Wang, Dong ;
Li, Jianhua ;
Lu, Huchuan ;
Yang, Xiaoyun .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :6297-6306