Dissecting Supervised Constrastive Learning

被引:0
|
作者
Graf, Florian [1 ]
Hofer, Christoph D. [1 ]
Niethammer, Marc [2 ]
Kwitt, Roland [1 ]
机构
[1] Univ Salzburg, Dept Comp Sci, Salzburg, Austria
[2] Univ N Carolina, Chapel Hill, NC USA
来源
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷
基金
奥地利科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Minimizing cross-entropy over the softmax scores of a linear map composed with a high-capacity encoder is arguably the most popular choice for training neural networks on supervised learning tasks. However, recent works show that one can directly optimize the encoder instead, to obtain equally (or even more) discriminative representations via a supervised variant of a contrastive objective. In this work, we address the question whether there are fundamental differences in the sought-for representation geometry in the output space of the encoder at minimal loss. Specifically, we prove, under mild assumptions, that both losses attain their minimum once the representations of each class collapse to the vertices of a regular simplex, inscribed in a hypersphere. We provide empirical evidence that this configuration is attained in practice and that reaching a close-to-optimal state typically indicates good generalization performance. Yet, the two losses show remarkably different optimization behavior. The number of iterations required to perfectly fit to data scales superlinearly with the amount of randomly flipped labels for the supervised contrastive loss. This is in contrast to the approximately linear scaling previously reported for networks trained with cross-entropy.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] A New Self-supervised Method for Supervised Learning
    Yang, Yuhang
    Ding, Zilin
    Cheng, Xuan
    Wang, Xiaomin
    Liu, Ming
    INTERNATIONAL CONFERENCE ON COMPUTER VISION, APPLICATION, AND DESIGN (CVAD 2021), 2021, 12155
  • [32] Supervised and semi-supervised machine learning ranking
    Vittaut, Jean-Noel
    Gallinari, Patrick
    COMPARATIVE EVALUATION OF XML INFORMATION RETRIEVAL SYSTEMS, 2007, 4518 : 213 - 222
  • [33] Adversarial de-overlapping learning machines for supervised and semi-supervised learning
    Sun, Yichen
    Vong, Chi Man
    Wang, Shitong
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, : 2249 - 2267
  • [34] Statistical-Mechanical Analysis Connecting Supervised Learning and Semi-Supervised Learning
    Fujii, Takashi
    Ito, Hidetaka
    Miyoshi, Seiji
    JOURNAL OF THE PHYSICAL SOCIETY OF JAPAN, 2017, 86 (06)
  • [35] A Supervised Learning framework for Learning Management Systems
    Olive, David Monllao
    Huynh, Du Q.
    Reynolds, Mark
    Dougiamas, Martin
    Wiese, Damyon
    PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON DATA SCIENCE, E-LEARNING AND INFORMATION SYSTEMS 2018 (DATA'18), 2018,
  • [36] Semi-supervised Learning with Transfer Learning
    Zhou, Huiwei
    Zhang, Yan
    Huang, Degen
    Li, Lishuang
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, 2013, 8208 : 109 - 119
  • [37] SUPERVISED MACHINE LEARNING: A SURVEY
    El Mrabet, Mohammed Amine
    El Makkaoui, Khalid
    Faize, Ahmed
    2021 INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGIES AND NETWORKING (COMMNET'21), 2021, : 127 - 136
  • [38] Supervised Learning of Graph Structure
    Torsello, Andrea
    Rossi, Luca
    SIMILARITY-BASED PATTERN RECOGNITION: FIRST INTERNATIONAL WORKSHOP, SIMBAD 2011, 2011, 7005 : 117 - 132
  • [39] On Supervised Learning of Sliding Observer
    Wong, Yew Wee
    Wang, Pengcheng
    Man, Zhihong
    Han, Qing-Long
    Jin, Jiong
    Zheng, Jinchuan
    2017 11TH ASIAN CONTROL CONFERENCE (ASCC), 2017, : 73 - 77
  • [40] Supervised Learning with Tensor Networks
    Stoudenmire, E. M.
    Schwab, David J.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29