Dissecting Supervised Constrastive Learning

被引:0
|
作者
Graf, Florian [1 ]
Hofer, Christoph D. [1 ]
Niethammer, Marc [2 ]
Kwitt, Roland [1 ]
机构
[1] Univ Salzburg, Dept Comp Sci, Salzburg, Austria
[2] Univ N Carolina, Chapel Hill, NC USA
来源
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷
基金
奥地利科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Minimizing cross-entropy over the softmax scores of a linear map composed with a high-capacity encoder is arguably the most popular choice for training neural networks on supervised learning tasks. However, recent works show that one can directly optimize the encoder instead, to obtain equally (or even more) discriminative representations via a supervised variant of a contrastive objective. In this work, we address the question whether there are fundamental differences in the sought-for representation geometry in the output space of the encoder at minimal loss. Specifically, we prove, under mild assumptions, that both losses attain their minimum once the representations of each class collapse to the vertices of a regular simplex, inscribed in a hypersphere. We provide empirical evidence that this configuration is attained in practice and that reaching a close-to-optimal state typically indicates good generalization performance. Yet, the two losses show remarkably different optimization behavior. The number of iterations required to perfectly fit to data scales superlinearly with the amount of randomly flipped labels for the supervised contrastive loss. This is in contrast to the approximately linear scaling previously reported for networks trained with cross-entropy.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Supervised Hebbian learning
    Alemanno, Francesco
    Aquaro, Miriam
    Kanter, Ido
    Barra, Adriano
    Agliari, Elena
    EPL, 2023, 141 (01)
  • [22] Supervised tensor learning
    Tao, DC
    Li, XL
    Hu, WM
    Maybank, S
    Wu, XD
    FIFTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2005, : 450 - 457
  • [23] CONSTRAINED SUPERVISED LEARNING
    JORDAN, MI
    JOURNAL OF MATHEMATICAL PSYCHOLOGY, 1992, 36 (03) : 396 - 425
  • [24] SUPERVISED FACTORIAL LEARNING
    REDLICH, AN
    NEURAL COMPUTATION, 1993, 5 (05) : 750 - 766
  • [25] SUPERVISED LEARNING IN THE BRAIN
    KNUDSEN, EI
    JOURNAL OF NEUROSCIENCE, 1994, 14 (07): : 3985 - 3997
  • [26] Supervised structure learning
    Friston, Karl J.
    Da Costa, Lancelot
    Tschantz, Alexander
    Kiefer, Alex
    Salvatori, Tommaso
    Neacsu, Victorita
    Koudahl, Magnus
    Heins, Conor
    Sajid, Noor
    Markovic, Dimitrije
    Parr, Thomas
    Verbelen, Tim
    Buckley, Christopher L.
    BIOLOGICAL PSYCHOLOGY, 2024, 193
  • [27] Supervised tensor learning
    Dacheng Tao
    Xuelong Li
    Xindong Wu
    Weiming Hu
    Stephen J. Maybank
    KNOWLEDGE AND INFORMATION SYSTEMS, 2007, 13 (01) : 1 - 42
  • [28] Supervised Learning for Dynamical System Learning
    Hefny, Ahmed
    Downey, Carlton
    Gordon, Geoffrey J.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [29] Learning to Minimize the Remainder in Supervised Learning
    Luo, Yan
    Wong, Yongkang
    Kankanhalli, Mohan S.
    Zhao, Qi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1738 - 1748
  • [30] Adversarial Dropout for Supervised and Semi-Supervised Learning
    Park, Sungrae
    Park, JunKeon
    Shin, Su-Jin
    Moon, Il-Chul
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 3917 - 3924