Two Models of Double Descent for Weak Features

被引:113
作者
Belkin, Mikhail [1 ]
Hsu, Daniel [2 ,3 ]
Xu, Ji [2 ]
机构
[1] Univ Calif San Diego, Halicioglu Data Sci Inst, La Jolla, CA USA
[2] Columbia Univ, Dept Comp Sci, New York, NY 10027 USA
[3] Columbia Univ, Data Sci Inst, New York, NY 10027 USA
来源
SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE | 2020年 / 2卷 / 04期
关键词
overparameterization; least squares; inductive bias; PERCEPTRON;
D O I
10.1137/20M1336072
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
The "double descent" risk curve was proposed to qualitatively describe the out-of-sample prediction accuracy of variably parameterized machine learning models. This article provides a precise mathematical analysis for the shape of this curve in two simple data models with the least squares/least norm predictor. Specifically, it is shown that the risk peaks when the number of features p is close to the sample size n but also that the risk sometimes decreases toward its minimum as p increases beyond n. This behavior parallels some key patterns observed in large models, including modern neural networks, and is contrasted with that of "prescient" models that select features in an a priori optimal order.
引用
收藏
页码:1167 / 1180
页数:14
相关论文
共 22 条
  • [1] Advani M. S., 2017, ARXIV171003667
  • [2] Concentration inequalities for sampling without replacement
    Bardenet, Remi
    Maillard, Odalric-Ambrym
    [J]. BERNOULLI, 2015, 21 (03) : 1361 - 1385
  • [3] Benign overfitting in linear regression
    Bartlett, Peter L.
    Long, Philip M.
    Lugosi, Gabor
    Tsigler, Alexander
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2020, 117 (48) : 30063 - 30070
  • [4] Belkin M., 2019, ARXIV190307571V1
  • [5] Reconciling modern machine-learning practice and the classical bias-variance trade-off
    Belkin, Mikhail
    Hsu, Daniel
    Ma, Siyuan
    Mandal, Soumik
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2019, 116 (32) : 15849 - 15854
  • [6] Dynamics of batch training in a perceptron
    Bos, S
    Opper, M
    [J]. JOURNAL OF PHYSICS A-MATHEMATICAL AND GENERAL, 1998, 31 (21): : 4835 - 4850
  • [7] HOW MANY VARIABLES SHOULD BE ENTERED IN A REGRESSION EQUATION
    BREIMAN, L
    FREEDMAN, D
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1983, 78 (381) : 131 - 136
  • [8] An elementary proof of a theorem of Johnson and Lindenstrauss
    Dasgupta, S
    Gupta, A
    [J]. RANDOM STRUCTURES & ALGORITHMS, 2003, 22 (01) : 60 - 65
  • [9] Dasgupta S., 2000, THESIS
  • [10] Limiting Empirical Singular Value Distribution of Restrictions of Discrete Fourier Transform Matrices
    Farrell, Brendan
    [J]. JOURNAL OF FOURIER ANALYSIS AND APPLICATIONS, 2011, 17 (04) : 733 - 753