Temporal relation transformer for robust visual tracking with dual-memory learning

被引:0
|
作者
Nie, Guohao [1 ]
Wang, Xingmei [1 ,2 ]
Yan, Zining [1 ,3 ]
Xu, Xiaoyuan [1 ]
Liu, Bo [4 ]
机构
[1] Harbin Engn Univ, Coll Comp Sci & Technol, Harbin 150001, Peoples R China
[2] Harbin Engn Univ, Natl Key Lab Underwater Acoust Technol, Harbin 150001, Peoples R China
[3] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 119077, Singapore
[4] Key Lab Avion Syst Integrated Technol, Shanghai 200030, Peoples R China
关键词
Visual tracking; Transformer; Temporal relation modeling; Memory mechanism; OBJECT TRACKING;
D O I
10.1016/j.asoc.2024.112229
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, transformer trackers mostly associate multiple reference images with the search area to adapt to the changing appearance of the target. However, they ignore the learned cross-relations between the target and surrounding, leading to difficulties in building coherent contextual models for specific target instances. This paper presents a Temporal Relation Transformer Tracker (TRTT) for robust visual tracking, providing a concise approach to modeling temporal relations by dual target memory learning. Specifically, a temporal relation transformer network generates paired memories based on static and dynamic templates, which are reinforced interactively. The memory contains implicit relation hints that capture the relations between the tracked object and its immediate surroundings. More importantly, to ensure consistency of target instance identities between frames, the relation hints from previous frames are transferred to the current frame for merging temporal contextual attention. Our method also incorporates mechanisms for reusing favorable cross-relations and instance-specific features, thereby overcoming background interference in complex spatio-temporal interactions through a sequential constraint. Furthermore, we design a memory token sparsification method that leverages the key points of the target to eliminate interferences and optimize attention calculations. Extensive experiments demonstrate that our method surpasses advanced trackers on 8 challenging benchmarks while maintaining real-time running speed.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Siamese Visual Tracking with Robust Adaptive Learning
    Zhang, Wancheng
    Chen, Zhi
    Liu, Peizhong
    Deng, Jianhua
    PROCEEDINGS OF 2019 IEEE 13TH INTERNATIONAL CONFERENCE ON ANTI-COUNTERFEITING, SECURITY, AND IDENTIFICATION (IEEE-ASID'2019), 2019, : 153 - 157
  • [32] Robust visual tracking with discriminative sparse learning
    Lu, Xiaoqiang
    Yuan, Yuan
    Yan, Pingkun
    PATTERN RECOGNITION, 2013, 46 (07) : 1762 - 1771
  • [33] Extended Hierarchical Temporal Memory for Visual Object Tracking
    Krys, Sebastian
    Jankowski, Stanislaw
    PHOTONICS APPLICATIONS IN ASTRONOMY, COMMUNICATIONS, INDUSTRY, AND HIGH-ENERGY PHYSICS EXPERIMENTS 2011, 2011, 8008
  • [34] Dual attentional transformer for video visual relation prediction q
    Qu, Mingcheng
    Deng, Ganlin
    Di, Donglin
    Cui, Jianxun
    Su, Tonghua
    NEUROCOMPUTING, 2023, 550
  • [35] Robust Visual Tracking via Multi-Scale Spatio-Temporal Context Learning
    Xue, Wanli
    Xu, Chao
    Feng, Zhiyong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2018, 28 (10) : 2849 - 2860
  • [36] Online learning and joint optimization of combined spatial-temporal models for robust visual tracking
    Zhou, Tao
    Bhaskar, Harish
    Liu, Fanghui
    Yang, Jie
    Cai, Ping
    NEUROCOMPUTING, 2017, 226 : 221 - 237
  • [37] Learning reliable modal weight with transformer for robust RGBT tracking
    Feng, Mingzheng
    Su, Jianbo
    KNOWLEDGE-BASED SYSTEMS, 2022, 249
  • [38] Memory Network With Pixel-Level Spatio-Temporal Learning for Visual Object Tracking
    Zhou, Zechu
    Zhou, Xinyu
    Chen, Zhaoyu
    Guo, Pinxue
    Liu, Qian-Yu
    Zhang, Wenqiang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (11) : 6897 - 6911
  • [39] ROBUST ONLINE VISUAL TRACKING VIA A TEMPORAL ENSEMBLE FRAMEWORK
    Guan, Hao
    Xue, Xiangyang
    2016 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO (ICME), 2016,
  • [40] Deep Spatial and Temporal Network for Robust Visual Object Tracking
    Teng, Zhu
    Xing, Junliang
    Wang, Qiang
    Zhang, Baopeng
    Fan, Jianping
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 1762 - 1775