Dissecting Supervised Constrastive Learning

被引：0

作者：

Graf, Florian ^{[1
]}

Hofer, Christoph D. ^{[1
]}

Niethammer, Marc ^{[2
]}

Kwitt, Roland ^{[1
]}

机构：

[1] Univ Salzburg, Dept Comp Sci, Salzburg, Austria

[2] Univ N Carolina, Chapel Hill, NC USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷

基金：

奥地利科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Minimizing cross-entropy over the softmax scores of a linear map composed with a high-capacity encoder is arguably the most popular choice for training neural networks on supervised learning tasks. However, recent works show that one can directly optimize the encoder instead, to obtain equally (or even more) discriminative representations via a supervised variant of a contrastive objective. In this work, we address the question whether there are fundamental differences in the sought-for representation geometry in the output space of the encoder at minimal loss. Specifically, we prove, under mild assumptions, that both losses attain their minimum once the representations of each class collapse to the vertices of a regular simplex, inscribed in a hypersphere. We provide empirical evidence that this configuration is attained in practice and that reaching a close-to-optimal state typically indicates good generalization performance. Yet, the two losses show remarkably different optimization behavior. The number of iterations required to perfectly fit to data scales superlinearly with the amount of randomly flipped labels for the supervised contrastive loss. This is in contrast to the approximately linear scaling previously reported for networks trained with cross-entropy.

引用

页数：10

共 50 条

[21] Supervised Hebbian learning
Alemanno, Francesco
Aquaro, Miriam
Kanter, Ido
Barra, Adriano
Agliari, Elena
EPL, 2023, 141 (01)
[22] Supervised tensor learning
Tao, DC
Li, XL
Hu, WM
Maybank, S
Wu, XD
FIFTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2005, : 450 - 457
[23] CONSTRAINED SUPERVISED LEARNING
JORDAN, MI
JOURNAL OF MATHEMATICAL PSYCHOLOGY, 1992, 36 (03) : 396 - 425
[24] SUPERVISED FACTORIAL LEARNING
REDLICH, AN
NEURAL COMPUTATION, 1993, 5 (05) : 750 - 766
[25] SUPERVISED LEARNING IN THE BRAIN
KNUDSEN, EI
JOURNAL OF NEUROSCIENCE, 1994, 14 (07): : 3985 - 3997
[26] Supervised structure learning
Friston, Karl J.
Da Costa, Lancelot
Tschantz, Alexander
Kiefer, Alex
Salvatori, Tommaso
Neacsu, Victorita
Koudahl, Magnus
Heins, Conor
Sajid, Noor
Markovic, Dimitrije
Parr, Thomas
Verbelen, Tim
Buckley, Christopher L.
BIOLOGICAL PSYCHOLOGY, 2024, 193
[27] Supervised tensor learning
Dacheng Tao
Xuelong Li
Xindong Wu
Weiming Hu
Stephen J. Maybank
KNOWLEDGE AND INFORMATION SYSTEMS, 2007, 13 (01) : 1 - 42
[28] Supervised Learning for Dynamical System Learning
Hefny, Ahmed
Downey, Carlton
Gordon, Geoffrey J.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
[29] Learning to Minimize the Remainder in Supervised Learning
Luo, Yan
Wong, Yongkang
Kankanhalli, Mohan S.
Zhao, Qi
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1738 - 1748
[30] Adversarial Dropout for Supervised and Semi-Supervised Learning
Park, Sungrae
Park, JunKeon
Shin, Su-Jin
Moon, Il-Chul
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 3917 - 3924

← 1 2 3 4 5 →