Temporal relation transformer for robust visual tracking with dual-memory learning

被引:0
|
作者
Nie, Guohao [1 ]
Wang, Xingmei [1 ,2 ]
Yan, Zining [1 ,3 ]
Xu, Xiaoyuan [1 ]
Liu, Bo [4 ]
机构
[1] Harbin Engn Univ, Coll Comp Sci & Technol, Harbin 150001, Peoples R China
[2] Harbin Engn Univ, Natl Key Lab Underwater Acoust Technol, Harbin 150001, Peoples R China
[3] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 119077, Singapore
[4] Key Lab Avion Syst Integrated Technol, Shanghai 200030, Peoples R China
关键词
Visual tracking; Transformer; Temporal relation modeling; Memory mechanism; OBJECT TRACKING;
D O I
10.1016/j.asoc.2024.112229
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, transformer trackers mostly associate multiple reference images with the search area to adapt to the changing appearance of the target. However, they ignore the learned cross-relations between the target and surrounding, leading to difficulties in building coherent contextual models for specific target instances. This paper presents a Temporal Relation Transformer Tracker (TRTT) for robust visual tracking, providing a concise approach to modeling temporal relations by dual target memory learning. Specifically, a temporal relation transformer network generates paired memories based on static and dynamic templates, which are reinforced interactively. The memory contains implicit relation hints that capture the relations between the tracked object and its immediate surroundings. More importantly, to ensure consistency of target instance identities between frames, the relation hints from previous frames are transferred to the current frame for merging temporal contextual attention. Our method also incorporates mechanisms for reusing favorable cross-relations and instance-specific features, thereby overcoming background interference in complex spatio-temporal interactions through a sequential constraint. Furthermore, we design a memory token sparsification method that leverages the key points of the target to eliminate interferences and optimize attention calculations. Extensive experiments demonstrate that our method surpasses advanced trackers on 8 challenging benchmarks while maintaining real-time running speed.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] STRUCTURAL SPATIO-TEMPORAL TRANSFORM FOR ROBUST VISUAL TRACKING
    Tang, Yazhe
    Lao, Mingjie
    Lin, Feng
    Wu, Denglu
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 1105 - 1109
  • [42] Effective and Robust: A Discriminative Temporal Learning Transformer for Satellite Videos
    Zhang, Xin
    Jiao, Licheng
    Li, Lingling
    Liu, Xu
    Liu, Fang
    Yang, Shuyuan
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [43] Learning Spatial-Frequency Transformer for Visual Object Tracking
    Tang, Chuanming
    Wang, Xiao
    Bai, Yuanchao
    Wu, Zhe
    Zhang, Jianlin
    Huang, Yongmei
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 5102 - 5116
  • [44] DPT-tracker: Dual pooling transformer for efficient visual tracking
    Fang, Yang
    Xie, Bailian
    Khairuddin, Uswah
    Min, Zijian
    Jiang, Bingbing
    Li, Weisheng
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2024, 9 (04) : 948 - 959
  • [45] Adaptive Online Learning Based Robust Visual Tracking
    Yang, Weiming
    Zhao, Meirong
    Huang, Yinguo
    Zheng, Yelong
    IEEE ACCESS, 2018, 6 : 14790 - 14798
  • [46] Robust Visual Tracking With Multitask Joint Dictionary Learning
    Fan, Heng
    Xiang, Jinhai
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2017, 27 (05) : 1018 - 1030
  • [47] Robust visual tracking using information theoretical learning
    Weifu Ding
    Jiangshe Zhang
    Annals of Mathematics and Artificial Intelligence, 2017, 80 : 113 - 129
  • [48] INCREMENTAL ROBUST LOCAL DICTIONARY LEARNING FOR VISUAL TRACKING
    Bai, Shanshan
    Liu, Risheng
    Su, Zhixun
    Zhang, Changcheng
    Jin, Wei
    2014 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2014,
  • [49] Learning Robust Gaussian Process Regression for Visual Tracking
    Zheng, Linyu
    Tang, Ming
    Wang, Jinqiao
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 1219 - 1225
  • [50] Learning spatially regularized similarity for robust visual tracking
    Zhou, Xiuzhuang
    Huo, Qirun
    Shang, Yuanyuan
    Xu, Min
    Ding, Hui
    IMAGE AND VISION COMPUTING, 2017, 60 : 134 - 141