Dissecting Supervised Constrastive Learning

被引：0

作者：

Graf, Florian ^{[1
]}

Hofer, Christoph D. ^{[1
]}

Niethammer, Marc ^{[2
]}

Kwitt, Roland ^{[1
]}

机构：

[1] Univ Salzburg, Dept Comp Sci, Salzburg, Austria

[2] Univ N Carolina, Chapel Hill, NC USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷

基金：

奥地利科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Minimizing cross-entropy over the softmax scores of a linear map composed with a high-capacity encoder is arguably the most popular choice for training neural networks on supervised learning tasks. However, recent works show that one can directly optimize the encoder instead, to obtain equally (or even more) discriminative representations via a supervised variant of a contrastive objective. In this work, we address the question whether there are fundamental differences in the sought-for representation geometry in the output space of the encoder at minimal loss. Specifically, we prove, under mild assumptions, that both losses attain their minimum once the representations of each class collapse to the vertices of a regular simplex, inscribed in a hypersphere. We provide empirical evidence that this configuration is attained in practice and that reaching a close-to-optimal state typically indicates good generalization performance. Yet, the two losses show remarkably different optimization behavior. The number of iterations required to perfectly fit to data scales superlinearly with the amount of randomly flipped labels for the supervised contrastive loss. This is in contrast to the approximately linear scaling previously reported for networks trained with cross-entropy.

引用

页数：10

共 50 条

[31] A New Self-supervised Method for Supervised Learning
Yang, Yuhang
Ding, Zilin
Cheng, Xuan
Wang, Xiaomin
Liu, Ming
INTERNATIONAL CONFERENCE ON COMPUTER VISION, APPLICATION, AND DESIGN (CVAD 2021), 2021, 12155
[32] Supervised and semi-supervised machine learning ranking
Vittaut, Jean-Noel
Gallinari, Patrick
COMPARATIVE EVALUATION OF XML INFORMATION RETRIEVAL SYSTEMS, 2007, 4518 : 213 - 222
[33] Adversarial de-overlapping learning machines for supervised and semi-supervised learning
Sun, Yichen
Vong, Chi Man
Wang, Shitong
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, : 2249 - 2267
[34] Statistical-Mechanical Analysis Connecting Supervised Learning and Semi-Supervised Learning
Fujii, Takashi
Ito, Hidetaka
Miyoshi, Seiji
JOURNAL OF THE PHYSICAL SOCIETY OF JAPAN, 2017, 86 (06)
[35] A Supervised Learning framework for Learning Management Systems
Olive, David Monllao
Huynh, Du Q.
Reynolds, Mark
Dougiamas, Martin
Wiese, Damyon
PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON DATA SCIENCE, E-LEARNING AND INFORMATION SYSTEMS 2018 (DATA'18), 2018,
[36] Semi-supervised Learning with Transfer Learning
Zhou, Huiwei
Zhang, Yan
Huang, Degen
Li, Lishuang
CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, 2013, 8208 : 109 - 119
[37] SUPERVISED MACHINE LEARNING: A SURVEY
El Mrabet, Mohammed Amine
El Makkaoui, Khalid
Faize, Ahmed
2021 INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGIES AND NETWORKING (COMMNET'21), 2021, : 127 - 136
[38] Supervised Learning of Graph Structure
Torsello, Andrea
Rossi, Luca
SIMILARITY-BASED PATTERN RECOGNITION: FIRST INTERNATIONAL WORKSHOP, SIMBAD 2011, 2011, 7005 : 117 - 132
[39] On Supervised Learning of Sliding Observer
Wong, Yew Wee
Wang, Pengcheng
Man, Zhihong
Han, Qing-Long
Jin, Jiong
Zheng, Jinchuan
2017 11TH ASIAN CONTROL CONFERENCE (ASCC), 2017, : 73 - 77
[40] Supervised Learning with Tensor Networks
Stoudenmire, E. M.
Schwab, David J.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29

← 1 2 3 4 5 →