A robust attention-enhanced network with transformer for visual tracking

被引:0
|
作者
Fengwei Gu
Jun Lu
Chengtao Cai
机构
[1] Harbin Engineering University,College of Intelligent Systems Science and Engineering
[2] Key laboratory of Intelligent Technology and Application of Marine Equipment (Harbin Engineering University),undefined
[3] Ministry of Education,undefined
来源
关键词
Visual tracking; Attention-enhanced network; Local feature information association module; Global feature information fusion module; Prediction network;
D O I
暂无
中图分类号
学科分类号
摘要
Recently, Siamese-based trackers have become particularly popular. The correlation module in these trackers is responsible for fusing the feature information from the template and the search region, to obtain the response results. However, there are very rich contextual information and feature dependencies among video sequences, and it is difficult for a simple correlation module to efficiently integrate useful information. Therefore, the tracker encounters the challenges of information loss and local optimal solutions. In this work, we propose a novel attention-enhanced network with a Transformer variant for robust visual tracking. The proposed method carefully designs the local feature information association module (LFIA) and the global feature information fusion module (GFIF) based on the attention mechanism, which can effectively utilize contextual information and feature dependencies to enhance feature information. Our approach transforms the visual tracking problem into a bounding box prediction problem, using only a simple prediction network for object localization, without any prior knowledge. Ultimately, we propose a robust tracker called RANformer. Experiments show that the proposed tracker achieves state-of-the-art performance on 7 popular tracking benchmarks while meeting real-time requirements with a speed exceeding 40FPS.
引用
收藏
页码:40761 / 40782
页数:21
相关论文
共 50 条
  • [1] A robust attention-enhanced network with transformer for visual tracking
    Gu, Fengwei
    Lu, Jun
    Cai, Chengtao
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (26) : 40761 - 40782
  • [2] APLNet: Attention-enhanced progressive learning network
    Zhang, Hui
    Kang, Danqing
    He, Haibo
    Wang, Fei-Yue
    NEUROCOMPUTING, 2020, 371 : 166 - 176
  • [3] AiATrack: Attention in Attention for Transformer Visual Tracking
    Gao, Shenyuan
    Zhou, Chunluan
    Ma, Chao
    Wang, Xinggang
    Yuan, Junsong
    COMPUTER VISION, ECCV 2022, PT XXII, 2022, 13682 : 146 - 164
  • [4] CHANNEL ATTENTION BASED GENERATIVE NETWORK FOR ROBUST VISUAL TRACKING
    Hu, Ying
    Xuan, Hanyu
    Yang, Jian
    Yan, Yan
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4082 - 4086
  • [5] Evota: an enhanced visual object tracking network with attention mechanism
    An Zhao
    Yi Zhang
    Multimedia Tools and Applications, 2024, 83 : 24939 - 24960
  • [6] Evota: an enhanced visual object tracking network with attention mechanism
    Zhao, An
    Zhang, Yi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (8) : 24939 - 24960
  • [7] Attention-enhanced neural network models for turbulence simulation
    Peng, Wenhui
    Yuan, Zelong
    Wang, Jianchun
    PHYSICS OF FLUIDS, 2022, 34 (02)
  • [8] Sparse Transformer Visual Tracking Network Based on Second-Order Attention
    Yang, Xiaolin
    Hou, Zhiqiang
    Guo, Fan
    Ma, Sugang
    Yu, Wangsheng
    Yang, Xiaobao
    2024 6TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING, ICNLP 2024, 2024, : 571 - 579
  • [9] FETrack: Feature-Enhanced Transformer Network for Visual Object Tracking
    Liu, Hang
    Huang, Detian
    Lin, Mingxin
    APPLIED SCIENCES-BASEL, 2024, 14 (22):
  • [10] MTAtrack: Multilevel transformer attention for visual tracking
    An, Dong
    Zhang, Fan
    Zhao, Yuqian
    Luo, Biao
    Yang, Chunhua
    Chen, Baifan
    Yu, Lingli
    OPTICS AND LASER TECHNOLOGY, 2023, 166