Mixed Autoencoder for Self-supervised Visual Representation Learning

被引:10
|
作者
Chen, Kai [1 ]
Liu, Zhili [1 ,2 ]
Hong, Lanqing [2 ]
Xu, Hang [2 ]
Li, Zhenguo [2 ]
Yeung, Dit-Yan [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
[2] Huawei Noahs Ark Lab, Montreal, PQ, Canada
关键词
D O I
10.1109/CVPR52729.2023.02178
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Masked Autoencoder (MAE) has demonstrated superior performance on various vision tasks via randomly masking image patches and reconstruction. However, effective data augmentation strategies for MAE still remain open questions, different from those in contrastive learning that serve as the most important part. This paper studies the prevailing mixing augmentation for MAE. We first demonstrate that naive mixing will in contrast degenerate model performance due to the increase of mutual information (MI). To address, we propose homologous recognition, an auxiliary pretext task, not only to alleviate the MI increasement by explicitly requiring each patch to recognize homologous patches, but also to perform object-aware self-supervised pre-training for better downstream dense perception performance. With extensive experiments, we demonstrate that our proposed Mixed Autoencoder (MixedAE) achieves the state-of-the-art transfer results among masked image modeling (MIM) augmentations on different downstream tasks with significant efficiency. Specifically, our MixedAE outperforms MAE by +0.3% accuracy, +1.7 mIoU and +0.9 AP on ImageNet-1K, ADE20K and COCO respectively with a standard ViT-Base. Moreover, MixedAE surpasses iBOT, a strong MIM method combined with instance discrimination, while accelerating training by 2x. To our best knowledge, this is the very first work to consider mixing for MIM from the perspective of pretext task design. Code will be made available.
引用
收藏
页码:22742 / 22751
页数:10
相关论文
共 50 条
  • [1] Context Autoencoder for Self-supervised Representation Learning
    Xiaokang Chen
    Mingyu Ding
    Xiaodi Wang
    Ying Xin
    Shentong Mo
    Yunhao Wang
    Shumin Han
    Ping Luo
    Gang Zeng
    Jingdong Wang
    International Journal of Computer Vision, 2024, 132 : 208 - 223
  • [2] Context Autoencoder for Self-supervised Representation Learning
    Chen, Xiaokang
    Ding, Mingyu
    Wang, Xiaodi
    Xin, Ying
    Mo, Shentong
    Wang, Yunhao
    Han, Shumin
    Luo, Ping
    Zeng, Gang
    Wang, Jingdong
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 132 (1) : 208 - 223
  • [3] Self-supervised Discriminative Representation Learning by Fuzzy Autoencoder
    Yang, Wenlu
    Wang, Hongjun
    Zhang, Yinghui
    Liu, Zehao
    Li, Tianrui
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2023, 14 (01)
  • [4] A Survey on Masked Autoencoder for Visual Self-supervised Learning
    Zhang, Chaoning
    Zhang, Chenshuang
    Song, Junha
    Yi, John Seon Keun
    Kweon, In So
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 6805 - 6813
  • [5] Self-Supervised Dense Visual Representation Learning
    Ozcelik, Timoteos Onur
    Gokberk, Berk
    Akarun, Lale
    32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,
  • [6] Revisiting Self-Supervised Visual Representation Learning
    Kolesnikov, Alexander
    Zhai, Xiaohua
    Beyer, Lucas
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1920 - 1929
  • [7] A survey on self-supervised methods for visual representation learning
    Uelwer, Tobias
    Robine, Jan
    Wagner, Stefan Sylvius
    Hoeftmann, Marc
    Upschulte, Eric
    Konietzny, Sebastian
    Behrendt, Maike
    Harmeling, Stefan
    MACHINE LEARNING, 2025, 114 (04)
  • [8] Scaling and Benchmarking Self-Supervised Visual Representation Learning
    Goyal, Priya
    Mahajan, Dhruv
    Gupta, Abhinav
    Misra, Ishan
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6400 - 6409
  • [9] Transitive Invariance for Self-supervised Visual Representation Learning
    Wang, Xiaolong
    He, Kaiming
    Gupta, Abhinav
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1338 - 1347
  • [10] Self-supervised Visual Representation Learning for Histopathological Images
    Yang, Pengshuai
    Hong, Zhiwei
    Yin, Xiaoxu
    Zhu, Chengzhan
    Jiang, Rui
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT II, 2021, 12902 : 47 - 57