Temporal relation transformer for robust visual tracking with dual-memory learning

被引:0
|
作者
Nie, Guohao [1 ]
Wang, Xingmei [1 ,2 ]
Yan, Zining [1 ,3 ]
Xu, Xiaoyuan [1 ]
Liu, Bo [4 ]
机构
[1] Harbin Engn Univ, Coll Comp Sci & Technol, Harbin 150001, Peoples R China
[2] Harbin Engn Univ, Natl Key Lab Underwater Acoust Technol, Harbin 150001, Peoples R China
[3] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 119077, Singapore
[4] Key Lab Avion Syst Integrated Technol, Shanghai 200030, Peoples R China
关键词
Visual tracking; Transformer; Temporal relation modeling; Memory mechanism; OBJECT TRACKING;
D O I
10.1016/j.asoc.2024.112229
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, transformer trackers mostly associate multiple reference images with the search area to adapt to the changing appearance of the target. However, they ignore the learned cross-relations between the target and surrounding, leading to difficulties in building coherent contextual models for specific target instances. This paper presents a Temporal Relation Transformer Tracker (TRTT) for robust visual tracking, providing a concise approach to modeling temporal relations by dual target memory learning. Specifically, a temporal relation transformer network generates paired memories based on static and dynamic templates, which are reinforced interactively. The memory contains implicit relation hints that capture the relations between the tracked object and its immediate surroundings. More importantly, to ensure consistency of target instance identities between frames, the relation hints from previous frames are transferred to the current frame for merging temporal contextual attention. Our method also incorporates mechanisms for reusing favorable cross-relations and instance-specific features, thereby overcoming background interference in complex spatio-temporal interactions through a sequential constraint. Furthermore, we design a memory token sparsification method that leverages the key points of the target to eliminate interferences and optimize attention calculations. Extensive experiments demonstrate that our method surpasses advanced trackers on 8 challenging benchmarks while maintaining real-time running speed.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] A robust attention-enhanced network with transformer for visual tracking
    Gu, Fengwei
    Lu, Jun
    Cai, Chengtao
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (26) : 40761 - 40782
  • [22] Shape-Guided Dual-Memory Learning for 3D Anomaly Detection
    Chu, Yu-Min
    Liu, Chieh
    Hsieh, Ting-I
    Chen, Hwann-Tzong
    Liu, Tyng-Luh
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [23] RPformer: A Robust Parallel Transformer for Visual Tracking in Complex Scenes
    Gu, Fengwei
    Lu, Jun
    Cai, Chengtao
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71
  • [24] A robust attention-enhanced network with transformer for visual tracking
    Fengwei Gu
    Jun Lu
    Chengtao Cai
    Multimedia Tools and Applications, 2023, 82 : 40761 - 40782
  • [25] Robust Visual Tracking based on Deep Spatial Transformer Features
    Zhang, Ximing
    Wang, Mingang
    Wei, Jinkang
    Cui, Can
    PROCEEDINGS OF THE 30TH CHINESE CONTROL AND DECISION CONFERENCE (2018 CCDC), 2018, : 5036 - 5041
  • [26] RTSformer: A Robust Toroidal Transformer With Spatiotemporal Features for Visual Tracking
    Gu, Fengwei
    Lu, Jun
    Cai, Chengtao
    Zhu, Qidan
    Ju, Zhaojie
    IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2024, 54 (02) : 214 - 225
  • [27] Extension of the dual-memory model of test-enhanced learning to distributions and individual differences
    Timothy C. Rickard
    Psychonomic Bulletin & Review, 2020, 27 : 783 - 790
  • [28] Robust Visual Tracking with Discrimination Dictionary Learning
    Wang, Yuanyun
    Deng, Chengzhi
    Wang, Jun
    Tian, Wei
    Wang, Shengqian
    ADVANCES IN MULTIMEDIA, 2018, 2018
  • [29] ROBUST VISUAL TRACKING VIA TRANSFER LEARNING
    Luo, Wenhan
    Li, Xi
    Li, Wei
    Hu, Weiming
    2011 18TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2011, : 485 - 488
  • [30] Learning Adaptive Metric for Robust Visual Tracking
    Jiang, Nan
    Liu, Wenyu
    Wu, Ying
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2011, 20 (08) : 2288 - 2300