SslTransT: Self-supervised pre-training visual object tracking with Transformers

被引:0
|
作者
Cai, Yannan [1 ]
Tan, Ke [1 ]
Wei, Zhenzhong [1 ]
机构
[1] Beihang Univ, Sch Instrumentat Sci & Optoelect Engn, Key Lab Precis Optomechatron Technol, Minist Educ, Beijing 100191, Peoples R China
基金
中国国家自然科学基金;
关键词
Self-supervised; Hybrid CNN-transformer; Visual object tracking; 6D pose measurement system; BENCHMARK;
D O I
10.1016/j.optcom.2024.130329
中图分类号
O43 [光学];
学科分类号
070207 ; 0803 ;
摘要
Transformer-based visual object tracking surpasses conventional CNN-based counterparts in superior performance but comes with additional computational overhead. Existing Transformer-based trackers rely on large-scale annotated data and longer training periods. To address this issue, we introduce a self-supervised pretext task, named target localization, which randomly crops the target and then pastes it onto various background images. The copy-paste-transform data augmentation strategy can composite sufficient training data and facilitate routine training. In addition, freezing the CNN backbone during pre -training and randomly adjusting template and search area factors further lead to faster training convergence. Extensive experiments both on public tracking benchmarks and real aircraft flight test videos demonstrate that our proposed tracker SslTransT significantly outperforms the baseline performance while requiring only half the training time. Furthermore, we apply SslTransT to a 6D pose measurement system based on vision and laser ranging, achieving excellent tracking results while running in real -time.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Object Adaptive Self-Supervised Dense Visual Pre-Training
    Zhang, Yu
    Zhang, Tao
    Zhu, Hongyuan
    Chen, Zihan
    Mi, Siya
    Peng, Xi
    Geng, Xin
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 2228 - 2240
  • [2] UniVIP: A Unified Framework for Self-Supervised Visual Pre-training
    Li, Zhaowen
    Zhu, Yousong
    Yang, Fan
    Li, Wei
    Zhao, Chaoyang
    Chen, Yingying
    Chen, Zhiyang
    Xie, Jiahao
    Wu, Liwei
    Zhao, Rui
    Tang, Ming
    Wang, Jinqiao
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 14607 - 14616
  • [3] Dense Contrastive Learning for Self-Supervised Visual Pre-Training
    Wang, Xinlong
    Zhang, Rufeng
    Shen, Chunhua
    Kong, Tao
    Li, Lei
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3023 - 3032
  • [4] Masked Feature Prediction for Self-Supervised Visual Pre-Training
    Wei, Chen
    Fan, Haoqi
    Xie, Saining
    Wu, Chao-Yuan
    Yuille, Alan
    Feichtenhofer, Christoph
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 14648 - 14658
  • [5] Correlational Image Modeling for Self-Supervised Visual Pre-Training
    Li, Wei
    Xie, Jiahao
    Loy, Chen Change
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15105 - 15115
  • [6] Self-supervised ECG pre-training
    Liu, Han
    Zhao, Zhenbo
    She, Qiang
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2021, 70
  • [7] Token Boosting for Robust Self-Supervised Visual Transformer Pre-training
    Li, Tianjiao
    Foo, Lin Geng
    Hu, Ping
    Shang, Xindi
    Rahmani, Hossein
    Yuan, Zehuan
    Liu, Jun
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 24027 - 24038
  • [8] DenseCL: A simple framework for self-supervised dense visual pre-training
    Wang, Xinlong
    Zhang, Rufeng
    Shen, Chunhua
    Kong, Tao
    VISUAL INFORMATICS, 2023, 7 (01) : 30 - 40
  • [9] Self-supervised Pre-training of Text Recognizers
    Kiss, Martin
    Hradis, Michal
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT IV, 2024, 14807 : 218 - 235
  • [10] Self-supervised Pre-training for Mirror Detection
    Lin, Jiaying
    Lau, Rynson W. H.
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 12193 - 12202