Benign Overfitting in Linear Classifiers and Leaky ReLU Networks from KKT Conditions for Margin Maximization

被引:0
|
作者
Frei, Spencer [1 ]
Vardi, Gal [2 ,3 ]
Bartlett, Peter L. [1 ,4 ]
Srebro, Nathan [2 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
[2] TTI Chicago, Chicago, IL USA
[3] Hebrew Univ Jerusalem, Jerusalem, Israel
[4] Google DeepMind, London, England
来源
THIRTY SIXTH ANNUAL CONFERENCE ON LEARNING THEORY, VOL 195 | 2023年 / 195卷
关键词
Benign overfitting; Linear classifiers; Leaky ReLU networks; Implicit bias; REGRESSION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Linear classifiers and leaky ReLU networks trained by gradient flow on the logistic loss have an implicit bias towards solutions which satisfy the Karush-Kuhn-Tucker (KKT) conditions for margin maximization. In this work we establish a number of settings where the satisfaction of these KKT conditions implies benign overfitting in linear classifiers and in two-layer leaky ReLU networks: the estimators interpolate noisy training data and simultaneously generalize well to test data. The settings include variants of the noisy class-conditional Gaussians considered in previous work as well as new distributional settings where benign overfitting has not been previously observed. The key ingredient to our proof is the observation that when the training data is nearly-orthogonal, both linear classifiers and leaky ReLU networks satisfying the KKT conditions for their respective margin maximization problems behave like a weighted average of the training examples.
引用
收藏
页数:56
相关论文
共 11 条
  • [1] On Margin Maximization in Linear and ReLU Networks
    Vardi, Gal
    Shamir, Ohad
    Srebro, Nathan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [2] From Tempered to Benign Overfitting in ReLU Neural Networks
    Kornowski, Guy
    Yehudai, Gilad
    Shamir, Ohad
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [3] A Linear Combination of Classifiers via Rank Margin Maximization
    Marrocco, Claudio
    Simeone, Paolo
    Tortorella, Francesco
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, 2010, 6218 : 650 - 659
  • [4] Provable Robustness of ReLU networks via Maximization of Linear Regions
    Croce, Francesco
    Andriushchenko, Maksym
    Hein, Matthias
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [5] Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data
    Frei, Spencer
    Chatterji, Niladri S.
    Bartlett, Peter L.
    CONFERENCE ON LEARNING THEORY, VOL 178, 2022, 178
  • [6] The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer Linear Networks
    Chatterji, Niladri S.
    Long, Philip M.
    Bartlett, Peter L.
    Journal of Machine Learning Research, 2022, 23
  • [7] The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer Linear Networks
    Chatterji, Niladri S.
    Long, Philip M.
    Bartlett, Peter L.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
  • [8] Learning Two-Layer ReLU Networks Is Nearly as Easy as Learning Linear Classifiers on Separable Data
    Yang, Qiuling
    Sadeghi, Alireza
    Wang, Gang
    Sun, Jian
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2021, 69 : 4416 - 4427
  • [9] Linear Classifiers with the L1 Margin from a Small Number of High-Dimensional Vectors
    Bobrowski, Leon
    Lukaszuk, Tomasz
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2020), PT II, 2020, 12034 : 79 - 89
  • [10] Deep learning neural networks for emotion classification from text: enhanced leaky rectified linear unit activation and weighted loss
    Yang, Hui
    Alsadoon, Abeer
    Prasad, P. W. C.
    Al-Dala'in, Thair
    Rashid, Tarik A.
    Maag, Angelika
    Alsadoon, Omar Hisham
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (11) : 15439 - 15468