MSTrack: Visual Tracking with Multi-scale Attention

被引:0
作者
Song, Chunlin [1 ]
Yao, Yu [1 ]
Guo, Jianhui [2 ]
Li, Lunbo [2 ]
机构
[1] Hangzhou Zhiyuan Res Inst Co Ltd, Hangzhou 310024, Peoples R China
[2] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China
来源
PROCEEDINGS OF 2024 INTERNATIONAL CONFERENCE ON COMPUTER AND MULTIMEDIA TECHNOLOGY, ICCMT 2024 | 2024年
基金
中国国家自然科学基金;
关键词
Visual object tracking; Vision Transformer; Multi-scale attention;
D O I
10.1145/3675249.3675309
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The attention mechanism has been widely applied to various computer vision tasks due to it excels at capturing global feature dependencies. However, for visual tracking tasks, conventional attention mechanisms model feature dependencies based on feature maps of only one size, which hinders the ability of the tracker to effectively handle target scale variations. To address this issue, we propose a novel multi-scale attention mechanism that captures global dependencies between the template and the search region from feature maps of various sizes, enhancing the sensitivity of the tracker to scale variations. Furthermore, our attention residual operation employs an attention prior to guide the modeling of small-size feature dependencies, effectively prioritizing the focus on primary target information. Extensive experimental results on widely-used tracking datasets GOT-10k, LaSOT, TNL2K, and TrackingNet, demonstrate that our proposed one-stream tracker MSTrack outperforms all previous state-of-the-art trackers, running at 100.5 FPS.
引用
收藏
页码:337 / 344
页数:8
相关论文
共 33 条
[1]   Fully-Convolutional Siamese Networks for Object Tracking [J].
Bertinetto, Luca ;
Valmadre, Jack ;
Henriques, Joao F. ;
Vedaldi, Andrea ;
Torr, Philip H. S. .
COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 :850-865
[2]  
Bhat Goutam, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12368), P205, DOI 10.1007/978-3-030-58592-1_13
[3]   Learning Discriminative Model Prediction for Tracking [J].
Bhat, Goutam ;
Danelljan, Martin ;
Van Gool, Luc ;
Timofte, Radu .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6181-6190
[4]   Transformer Tracking [J].
Chen, Xin ;
Yan, Bin ;
Zhu, Jiawen ;
Wang, Dong ;
Yang, Xiaoyun ;
Lu, Huchuan .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :8122-8131
[5]   Probabilistic Regression for Visual Tracking [J].
Danelljan, Martin ;
Van Gool, Luc ;
Timofte, Radu .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :7181-7190
[6]  
Dosovitskiy A, 2021, INT C LEARN REPR
[7]   LaSOT: A High-quality Benchmark for Large-scale Single Object Tracking [J].
Fan, Heng ;
Lin, Liting ;
Yang, Fan ;
Chu, Peng ;
Deng, Ge ;
Yu, Sijia ;
Bai, Hexin ;
Xu, Yong ;
Liao, Chunyuan ;
Ling, Haibin .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :5369-5378
[8]  
Fu Z., 2022, INT JOINT C ART INT
[9]   STMTrack: Template-free Visual Tracking with Space-time Memory Networks [J].
Fu, Zhihong ;
Liu, Qingjie ;
Fu, Zehua ;
Wang, Yunhong .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :13769-13778
[10]   Graph Attention Tracking [J].
Guo, Dongyan ;
Shao, Yanyan ;
Cui, Ying ;
Wang, Zhenhua ;
Zhang, Liyan ;
Shen, Chunhua .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :9538-9547