Two Models of Double Descent for Weak Features

被引：113

作者：

Belkin, Mikhail ^{[1
]}

Hsu, Daniel ^{[2
,3
]}

Xu, Ji ^{[2
]}

机构：

[1] Univ Calif San Diego, Halicioglu Data Sci Inst, La Jolla, CA USA

[2] Columbia Univ, Dept Comp Sci, New York, NY 10027 USA

[3] Columbia Univ, Data Sci Inst, New York, NY 10027 USA

来源：

SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE | 2020年 / 2卷 / 04期

关键词：

overparameterization; least squares; inductive bias; PERCEPTRON;

D O I：

10.1137/20M1336072

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

The "double descent" risk curve was proposed to qualitatively describe the out-of-sample prediction accuracy of variably parameterized machine learning models. This article provides a precise mathematical analysis for the shape of this curve in two simple data models with the least squares/least norm predictor. Specifically, it is shown that the risk peaks when the number of features p is close to the sample size n but also that the risk sometimes decreases toward its minimum as p increases beyond n. This behavior parallels some key patterns observed in large models, including modern neural networks, and is contrasted with that of "prescient" models that select features in an a priori optimal order.

引用

页码：1167 / 1180

页数：14

共 22 条

[1] Advani M. S., 2017, ARXIV171003667
[2] Concentration inequalities for sampling without replacement
Bardenet, Remi
Maillard, Odalric-Ambrym
[J]. BERNOULLI, 2015, 21 (03) : 1361 - 1385
[3] Benign overfitting in linear regression
Bartlett, Peter L.
Long, Philip M.
Lugosi, Gabor
Tsigler, Alexander
[J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2020, 117 (48) : 30063 - 30070
[4] Belkin M., 2019, ARXIV190307571V1
[5] Reconciling modern machine-learning practice and the classical bias-variance trade-off
Belkin, Mikhail
Hsu, Daniel
Ma, Siyuan
Mandal, Soumik
[J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2019, 116 (32) : 15849 - 15854
[6] Dynamics of batch training in a perceptron
Bos, S
Opper, M
[J]. JOURNAL OF PHYSICS A-MATHEMATICAL AND GENERAL, 1998, 31 (21): : 4835 - 4850
[7] HOW MANY VARIABLES SHOULD BE ENTERED IN A REGRESSION EQUATION
BREIMAN, L
FREEDMAN, D
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1983, 78 (381) : 131 - 136
[8] An elementary proof of a theorem of Johnson and Lindenstrauss
Dasgupta, S
Gupta, A
[J]. RANDOM STRUCTURES & ALGORITHMS, 2003, 22 (01) : 60 - 65
[9] Dasgupta S., 2000, THESIS
[10] Limiting Empirical Singular Value Distribution of Restrictions of Discrete Fourier Transform Matrices
Farrell, Brendan
[J]. JOURNAL OF FOURIER ANALYSIS AND APPLICATIONS, 2011, 17 (04) : 733 - 753

← 1 2 3 →