Token Boosting for Robust Self-Supervised Visual Transformer Pre-training

被引:2
|
作者
Li, Tianjiao [1 ]
Foo, Lin Geng [1 ]
Hu, Ping [2 ]
Shang, Xindi [3 ]
Rahmani, Hossein [4 ]
Yuan, Zehuan [3 ]
Liu, Jun [1 ]
机构
[1] Singapore Univ Technol & Design, Singapore, Singapore
[2] Boston Univ, Boston, MA 02215 USA
[3] ByteDance, Beijing, Peoples R China
[4] Univ Lancaster, Lancaster, England
基金
新加坡国家研究基金会; 欧盟地平线“2020”;
关键词
OBJECT;
D O I
10.1109/CVPR52729.2023.02301
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning with large-scale unlabeled data has become a powerful tool for pre-training Visual Transformers (VTs). However, prior works tend to overlook that, in real-world scenarios, the input data may be corrupted and unreliable. Pre-training VTs on such corrupted data can be challenging, especially when we pre-train via the masked autoencoding approach, where both the inputs and masked "ground truth" targets can potentially be unreliable in this case. To address this limitation, we introduce the Token Boosting Module (TBM) as a plug-and-play component for VTs that effectively allows the VT to learn to extract clean and robust features during masked autoencoding pre-training. We provide theoretical analysis to show how TBM improves model pre-training with more robust and generalizable representations, thus benefiting downstream tasks. We conduct extensive experiments to analyze TBM's effectiveness, and results on four corrupted datasets demonstrate that TBM consistently improves performance on downstream tasks.
引用
收藏
页码:24027 / 24038
页数:12
相关论文
共 50 条
  • [31] A Unified Visual Information Preservation Framework for Self-supervised Pre-Training in Medical Image Analysis
    Zhou, Hong-Yu
    Lu, Chixiang
    Chen, Chaoqi
    Yang, Sibei
    Yu, Yizhou
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (07) : 8020 - 8035
  • [32] Multimodal Visual-Tactile Representation Learning through Self-Supervised Contrastive Pre-Training
    Dave, Vedant
    Lygerakis, Fotios
    Rueckert, Elmar
    2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024, 2024, : 8013 - 8020
  • [33] Joint Encoder-Decoder Self-Supervised Pre-training for ASR
    Arunkumar, A.
    Umesh, S.
    INTERSPEECH 2022, 2022, : 3418 - 3422
  • [34] ENHANCING THE DOMAIN ROBUSTNESS OF SELF-SUPERVISED PRE-TRAINING WITH SYNTHETIC IMAGES
    Hassan, Mohamad N. C.
    Bhattacharya, Avigyan
    da Costa, Victor G. Turrisi
    Banerjee, Biplab
    Ricci, Elisa
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 5470 - 5474
  • [35] Individualized Stress Mobile Sensing Using Self-Supervised Pre-Training
    Islam, Tanvir
    Washington, Peter
    APPLIED SCIENCES-BASEL, 2023, 13 (21):
  • [36] Stabilizing Label Assignment for Speech Separation by Self-supervised Pre-training
    Huang, Sung-Feng
    Chuang, Shun-Po
    Liu, Da-Rong
    Chen, Yi-Chen
    Yang, Gene-Ping
    Lee, Hung-yi
    INTERSPEECH 2021, 2021, : 3056 - 3060
  • [37] Self-Supervised Pre-training for Protein Embeddings Using Tertiary Structures
    Guo, Yuzhi
    Wu, Jiaxiang
    Ma, Hehuan
    Huang, Junzhou
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 6801 - 6809
  • [38] Progressive self-supervised learning: A pre-training method for crowd counting
    Gu, Yao
    Zheng, Zhe
    Wu, Yingna
    Xie, Guangping
    Ni, Na
    PATTERN RECOGNITION LETTERS, 2025, 188 : 148 - 154
  • [39] DialogueBERT: A Self-Supervised Learning based Dialogue Pre-training Encoder
    Zhang, Zhenyu
    Guo, Tao
    Chen, Meng
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 3647 - 3651
  • [40] Self-supervised Heterogeneous Graph Pre-training Based on Structural Clustering
    Yang, Yaming
    Guan, Ziyu
    Wang, Zhe
    Zhao, Wei
    Xu, Cai
    Lu, Weigang
    Huang, Jianbin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,