MVEB: Self-Supervised Learning With Multi-View Entropy Bottleneck

被引:1
作者
Wen, Liangjian [1 ,2 ]
Wang, Xiasi [3 ]
Liu, Jianzhuang [4 ]
Xu, Zenglin [5 ,6 ]
机构
[1] Southwestern Univ Finance & Econ, Sch Comp & Artificial Intelligence, Chengdu 610074, Peoples R China
[2] Southwestern Univ Finance & Econ, Res Inst Digital Econ & Interdisciplinary Sci, Chengdu 610074, Peoples R China
[3] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
[4] Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China
[5] Harbin Inst Technol Shenzhen, Shenzhen 150001, Peoples R China
[6] Pengcheng Lab, Shenzhen 518066, Peoples R China
关键词
Task analysis; Entropy; Mutual information; Supervised learning; Feature extraction; Minimal sufficient representation; representation learning; self-supervised learning;
D O I
10.1109/TPAMI.2024.3380065
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Self-supervised learning aims to learn representation that can be effectively generalized to downstream tasks. Many self-supervised approaches regard two views of an image as both the input and the self-supervised signals, assuming that either view contains the same task-relevant information and the shared information is (approximately) sufficient for predicting downstream tasks. Recent studies show that discarding superfluous information not shared between the views can improve generalization. Hence, the ideal representation is sufficient for downstream tasks and contains minimal superfluous information, termed minimal sufficient representation. One can learn this representation by maximizing the mutual information between the representation and the supervised view while eliminating superfluous information. Nevertheless, the computation of mutual information is notoriously intractable. In this work, we propose an objective termed multi-view entropy bottleneck (MVEB) to learn minimal sufficient representation effectively. MVEB simplifies the minimal sufficient learning to maximizing both the agreement between the embeddings of two views and the differential entropy of the embedding distribution. Our experiments confirm that MVEB significantly improves performance. For example, it achieves top-1 accuracy of 76.9% on ImageNet with a vanilla ResNet-50 backbone on linear evaluation. To the best of our knowledge, this is the new state-of-the-art result with ResNet-50.
引用
收藏
页码:6097 / 6108
页数:12
相关论文
共 57 条
[1]  
Achille A, 2018, J MACH LEARN RES, V19
[2]  
Amrani E., 2022, P EUR C COMP VIS
[3]  
Bardes A., 2022, P INT C LEARN REPR
[4]  
Belghazi MI, 2018, PR MACH LEARN RES, V80
[5]  
Bossard L, 2014, LECT NOTES COMPUT SC, V8694, P446, DOI 10.1007/978-3-319-10599-4_29
[6]  
Caron M, 2020, ADV NEUR IN, V33
[7]   Emerging Properties in Self-Supervised Vision Transformers [J].
Caron, Mathilde ;
Touvron, Hugo ;
Misra, Ishan ;
Jegou, Herve ;
Mairal, Julien ;
Bojanowski, Piotr ;
Joulin, Armand .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9630-9640
[8]   Deep Clustering for Unsupervised Learning of Visual Features [J].
Caron, Mathilde ;
Bojanowski, Piotr ;
Joulin, Armand ;
Douze, Matthijs .
COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 :139-156
[9]  
Chen T, 2020, PR MACH LEARN RES, V119
[10]  
Chen XL, 2020, Arxiv, DOI [arXiv:2003.04297, DOI 10.48550/ARXIV.2003.04297]