MSTrack: Visual Tracking with Multi-scale Attention

被引：0

作者：

Song, Chunlin ^{[1
]}

Yao, Yu ^{[1
]}

Guo, Jianhui ^{[2
]}

Li, Lunbo ^{[2
]}

机构：

[1] Hangzhou Zhiyuan Res Inst Co Ltd, Hangzhou 310024, Peoples R China

[2] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China

来源：

PROCEEDINGS OF 2024 INTERNATIONAL CONFERENCE ON COMPUTER AND MULTIMEDIA TECHNOLOGY, ICCMT 2024 | 2024年

基金：

中国国家自然科学基金;

关键词：

Visual object tracking; Vision Transformer; Multi-scale attention;

D O I：

10.1145/3675249.3675309

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The attention mechanism has been widely applied to various computer vision tasks due to it excels at capturing global feature dependencies. However, for visual tracking tasks, conventional attention mechanisms model feature dependencies based on feature maps of only one size, which hinders the ability of the tracker to effectively handle target scale variations. To address this issue, we propose a novel multi-scale attention mechanism that captures global dependencies between the template and the search region from feature maps of various sizes, enhancing the sensitivity of the tracker to scale variations. Furthermore, our attention residual operation employs an attention prior to guide the modeling of small-size feature dependencies, effectively prioritizing the focus on primary target information. Extensive experimental results on widely-used tracking datasets GOT-10k, LaSOT, TNL2K, and TrackingNet, demonstrate that our proposed one-stream tracker MSTrack outperforms all previous state-of-the-art trackers, running at 100.5 FPS.

引用

页码：337 / 344

页数：8

共 33 条

[11]

Guo M., 2022, INT JOINT C ART INT

[12]

Guo MH, 2022, ADV NEUR IN

[13] Masked Autoencoders Are Scalable Vision Learners [J].

He, Kaiming ;

Chen, Xinlei ;

Xie, Saining ;

Li, Yanghao ;

Dollar, Piotr ;

Girshick, Ross .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :15979-15988

[14] GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild [J].

Huang, Lianghua ;

Zhao, Xin ;

Huang, Kaiqi .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (05) :1562-1577

[15] Towards Sequence-Level Training for Visual Tracking [J].

Kim, Minji ;

Lee, Seungkwan ;

Ok, Jungseul ;

Han, Bohyung ;

Cho, Minsu .

COMPUTER VISION, ECCV 2022, PT XXII, 2022, 13682 :534-551

[16] Microsoft COCO: Common Objects in Context [J].

Lin, Tsung-Yi ;

Maire, Michael ;

Belongie, Serge ;

Hays, James ;

Perona, Pietro ;

Ramanan, Deva ;

Dollar, Piotr ;

Zitnick, C. Lawrence .

COMPUTER VISION - ECCV 2014, PT V, 2014, 8693 :740-755

[17]

Loshchilov I., 2017, INT C LEARN REPR

[18] D3S-A Discriminative Single Shot Segmentation Tracker [J].

Lukezic, Alan ;

Matas, Jiri ;

Kristan, Matej .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :7131-7140

[19] Unified Transformer Tracker for Object Tracking [J].

Ma, Fan ;

Shou, Mike Zheng ;

Zhu, Linchao ;

Fan, Haoqi ;

Xu, Yilei ;

Yang, Yi ;

Yan, Zhicheng .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :8771-8780

[20] TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild [J].

Mueller, Matthias ;

Bibi, Adel ;

Giancola, Silvio ;

Alsubaihi, Salman ;

Ghanem, Bernard .

COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 :310-327

← 1 2 3 4 →