Benign Overfitting in Linear Classifiers and Leaky ReLU Networks from KKT Conditions for Margin Maximization

被引：0

作者：

Frei, Spencer ^{[1
]}

Vardi, Gal ^{[2
,3
]}

Bartlett, Peter L. ^{[1
,4
]}

Srebro, Nathan ^{[2
]}

机构：

[1] Univ Calif Berkeley, Berkeley, CA 94720 USA

[2] TTI Chicago, Chicago, IL USA

[3] Hebrew Univ Jerusalem, Jerusalem, Israel

[4] Google DeepMind, London, England

来源：

THIRTY SIXTH ANNUAL CONFERENCE ON LEARNING THEORY, VOL 195 | 2023年 / 195卷

关键词：

Benign overfitting; Linear classifiers; Leaky ReLU networks; Implicit bias; REGRESSION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Linear classifiers and leaky ReLU networks trained by gradient flow on the logistic loss have an implicit bias towards solutions which satisfy the Karush-Kuhn-Tucker (KKT) conditions for margin maximization. In this work we establish a number of settings where the satisfaction of these KKT conditions implies benign overfitting in linear classifiers and in two-layer leaky ReLU networks: the estimators interpolate noisy training data and simultaneously generalize well to test data. The settings include variants of the noisy class-conditional Gaussians considered in previous work as well as new distributional settings where benign overfitting has not been previously observed. The key ingredient to our proof is the observation that when the training data is nearly-orthogonal, both linear classifiers and leaky ReLU networks satisfying the KKT conditions for their respective margin maximization problems behave like a weighted average of the training examples.

引用

页数：56

共 11 条

[1] On Margin Maximization in Linear and ReLU Networks
Vardi, Gal
Shamir, Ohad
Srebro, Nathan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[2] From Tempered to Benign Overfitting in ReLU Neural Networks
Kornowski, Guy
Yehudai, Gilad
Shamir, Ohad
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[3] A Linear Combination of Classifiers via Rank Margin Maximization
Marrocco, Claudio
Simeone, Paolo
Tortorella, Francesco
STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, 2010, 6218 : 650 - 659
[4] Provable Robustness of ReLU networks via Maximization of Linear Regions
Croce, Francesco
Andriushchenko, Maksym
Hein, Matthias
22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
[5] Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data
Frei, Spencer
Chatterji, Niladri S.
Bartlett, Peter L.
CONFERENCE ON LEARNING THEORY, VOL 178, 2022, 178
[6] The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer Linear Networks
Chatterji, Niladri S.
Long, Philip M.
Bartlett, Peter L.
Journal of Machine Learning Research, 2022, 23
[7] The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer Linear Networks
Chatterji, Niladri S.
Long, Philip M.
Bartlett, Peter L.
JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
[8] Learning Two-Layer ReLU Networks Is Nearly as Easy as Learning Linear Classifiers on Separable Data
Yang, Qiuling
Sadeghi, Alireza
Wang, Gang
Sun, Jian
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2021, 69 : 4416 - 4427
[9] Linear Classifiers with the L1 Margin from a Small Number of High-Dimensional Vectors
Bobrowski, Leon
Lukaszuk, Tomasz
INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2020), PT II, 2020, 12034 : 79 - 89
[10] Deep learning neural networks for emotion classification from text: enhanced leaky rectified linear unit activation and weighted loss
Yang, Hui
Alsadoon, Abeer
Prasad, P. W. C.
Al-Dala'in, Thair
Rashid, Tarik A.
Maag, Angelika
Alsadoon, Omar Hisham
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (11) : 15439 - 15468

← 1 2 →