A robust attention-enhanced network with transformer for visual tracking

被引：0

作者：

Fengwei Gu

Jun Lu

Chengtao Cai

机构：

[1] Harbin Engineering University,College of Intelligent Systems Science and Engineering

[2] Key laboratory of Intelligent Technology and Application of Marine Equipment (Harbin Engineering University),undefined

[3] Ministry of Education,undefined

来源：

Multimedia Tools and Applications | 2023年 / 82卷

关键词：

Visual tracking; Attention-enhanced network; Local feature information association module; Global feature information fusion module; Prediction network;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Recently, Siamese-based trackers have become particularly popular. The correlation module in these trackers is responsible for fusing the feature information from the template and the search region, to obtain the response results. However, there are very rich contextual information and feature dependencies among video sequences, and it is difficult for a simple correlation module to efficiently integrate useful information. Therefore, the tracker encounters the challenges of information loss and local optimal solutions. In this work, we propose a novel attention-enhanced network with a Transformer variant for robust visual tracking. The proposed method carefully designs the local feature information association module (LFIA) and the global feature information fusion module (GFIF) based on the attention mechanism, which can effectively utilize contextual information and feature dependencies to enhance feature information. Our approach transforms the visual tracking problem into a bounding box prediction problem, using only a simple prediction network for object localization, without any prior knowledge. Ultimately, we propose a robust tracker called RANformer. Experiments show that the proposed tracker achieves state-of-the-art performance on 7 popular tracking benchmarks while meeting real-time requirements with a speed exceeding 40FPS.

引用

页码：40761 / 40782

页数：21

共 50 条

[1] A robust attention-enhanced network with transformer for visual tracking
Gu, Fengwei
Lu, Jun
Cai, Chengtao
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (26) : 40761 - 40782
[2] APLNet: Attention-enhanced progressive learning network
Zhang, Hui
Kang, Danqing
He, Haibo
Wang, Fei-Yue
NEUROCOMPUTING, 2020, 371 : 166 - 176
[3] AiATrack: Attention in Attention for Transformer Visual Tracking
Gao, Shenyuan
Zhou, Chunluan
Ma, Chao
Wang, Xinggang
Yuan, Junsong
COMPUTER VISION, ECCV 2022, PT XXII, 2022, 13682 : 146 - 164
[4] CHANNEL ATTENTION BASED GENERATIVE NETWORK FOR ROBUST VISUAL TRACKING
Hu, Ying
Xuan, Hanyu
Yang, Jian
Yan, Yan
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4082 - 4086
[5] Evota: an enhanced visual object tracking network with attention mechanism
An Zhao
Yi Zhang
Multimedia Tools and Applications, 2024, 83 : 24939 - 24960
[6] Evota: an enhanced visual object tracking network with attention mechanism
Zhao, An
Zhang, Yi
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (8) : 24939 - 24960
[7] Attention-enhanced neural network models for turbulence simulation
Peng, Wenhui
Yuan, Zelong
Wang, Jianchun
PHYSICS OF FLUIDS, 2022, 34 (02)
[8] Sparse Transformer Visual Tracking Network Based on Second-Order Attention
Yang, Xiaolin
Hou, Zhiqiang
Guo, Fan
Ma, Sugang
Yu, Wangsheng
Yang, Xiaobao
2024 6TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING, ICNLP 2024, 2024, : 571 - 579
[9] FETrack: Feature-Enhanced Transformer Network for Visual Object Tracking
Liu, Hang
Huang, Detian
Lin, Mingxin
APPLIED SCIENCES-BASEL, 2024, 14 (22):
[10] MTAtrack: Multilevel transformer attention for visual tracking
An, Dong
Zhang, Fan
Zhao, Yuqian
Luo, Biao
Yang, Chunhua
Chen, Baifan
Yu, Lingli
OPTICS AND LASER TECHNOLOGY, 2023, 166

← 1 2 3 4 5 →