RPformer: A Robust Parallel Transformer for Visual Tracking in Complex Scenes

被引:44
|
作者
Gu, Fengwei [1 ,2 ]
Lu, Jun [1 ,2 ]
Cai, Chengtao [1 ,2 ]
机构
[1] Harbin Engn Univ, Coll Intelligent Syst Sci & Engn, Harbin 150001, Peoples R China
[2] Harbin Engn Univ, Minist Educ, Key Lab Intelligent Technol & Applicat Marine Equ, Harbin 150001, Peoples R China
基金
中国国家自然科学基金; 黑龙江省自然科学基金;
关键词
Target tracking; Transformers; Correlation; Visualization; Feature extraction; Information filters; Kernel; Attention mechanism; complex scenes; feature fusion head (FFH); parallel Transformer network; visual tracking; NETWORK;
D O I
10.1109/TIM.2022.3170972
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The Siamese architecture has shown remarkable performance in the field of visual tracking. Although the existing Siamese-based tracking methods have achieved a relative balance between accuracy and speed, the performance of many trackers in complex scenes is often unsatisfactory, which is mainly caused by interference factors, such as target scale changes, occlusion, and fast movement. In these cases, excessive trackers cannot employ sufficiently the target feature information and face the dilemma of information loss. In this work, we propose a novel parallel Transformer network architecture to achieve robust visual tracking. The proposed method designs the Transformer-1 module, the Transformer-2 module, and the feature fusion head (FFH) based on the attention mechanism. The Transformer-1 module and the Transformer-2 module are regarded as corresponding complementary branches in the parallel architecture. The FFH is used to integrate the feature information of the two parallel branches, which can efficiently exploit the feature dependence relationship between the template and the search region, and comprehensively explore rich contextual information. Finally, by combining the core ideas of Siamese and Transformer, we present a simple and robust tracking framework called RPformer, which does not require any prior knowledge and avoids the trouble of adjusting hyperparameters. Numerous experiments show that the proposed tracking method achieves more outstanding performance than the state-of-the-art trackers on seven tracking benchmarks, which can meet the real-time requirements at a running speed exceeding 50.0 frames/s.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] A Robust Visual Tracking Method Based on Reconstruction Patch Transformer Tracking
    Chen, Hui
    Wang, Zhenhai
    Tian, Hongyu
    Yuan, Lutao
    Wang, Xing
    Leng, Peng
    SENSORS, 2022, 22 (17)
  • [2] A Fusion Approach for Robust Visual Object Tracking in Crowd Scenes
    Oh, Tae-Hyun
    Joo, Kyungdon
    Kim, Junsik
    Park, Jaesik
    Kweon, In So
    2014 11TH INTERNATIONAL CONFERENCE ON UBIQUITOUS ROBOTS AND AMBIENT INTELLIGENCE (URAI), 2014, : 558 - 560
  • [3] Robust Detection and Tracking Algorithm of Multiple Objects in Complex Scenes
    Hu, Hong-Yu
    Qu, Zhao-Wei
    Li, Zhi-Hui
    Wang, Qing-Nian
    APPLIED MATHEMATICS & INFORMATION SCIENCES, 2014, 8 (05): : 2485 - 2490
  • [4] Robust Object Tracking Algorithm for Autonomous Vehicles in Complex Scenes
    Cao, Jingwei
    Song, Chuanxue
    Song, Shixin
    Xiao, Feng
    Zhang, Xu
    Liu, Zhiyang
    Ang, Marcelo H., Jr.
    REMOTE SENSING, 2021, 13 (16)
  • [5] Propagating prior information with transformer for robust visual object tracking
    Wu, Yue
    Cai, Chengtao
    Yeo, Chai Kiat
    MULTIMEDIA SYSTEMS, 2024, 30 (05)
  • [6] A robust attention-enhanced network with transformer for visual tracking
    Gu, Fengwei
    Lu, Jun
    Cai, Chengtao
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (26) : 40761 - 40782
  • [7] A robust attention-enhanced network with transformer for visual tracking
    Fengwei Gu
    Jun Lu
    Chengtao Cai
    Multimedia Tools and Applications, 2023, 82 : 40761 - 40782
  • [8] Robust Visual Tracking based on Deep Spatial Transformer Features
    Zhang, Ximing
    Wang, Mingang
    Wei, Jinkang
    Cui, Can
    PROCEEDINGS OF THE 30TH CHINESE CONTROL AND DECISION CONFERENCE (2018 CCDC), 2018, : 5036 - 5041
  • [9] RTSformer: A Robust Toroidal Transformer With Spatiotemporal Features for Visual Tracking
    Gu, Fengwei
    Lu, Jun
    Cai, Chengtao
    Zhu, Qidan
    Ju, Zhaojie
    IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2024, 54 (02) : 214 - 225
  • [10] Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking
    Wang, Ning
    Zhou, Wengang
    Wang, Jie
    Li, Houqiang
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1571 - 1580