Near-Minimax Optimal Estimation With Shallow ReLU Neural Networks

被引:12
作者
Parhi, Rahul [1 ,2 ]
Nowak, Robert D. D. [1 ]
机构
[1] Univ Wisconsin Madison, Dept Elect & Comp Engn, Madison, WI 53706 USA
[2] Ecole Polytech Fed Lausanne, Biomed Imaging Grp, CH-1015 Lausanne, Switzerland
关键词
Estimation; Training; Biological neural networks; TV; Radon; Noise measurement; Neurons; Neural networks; ridge functions; sparsity; function approximation; nonparametric function estimation; NONPARAMETRIC REGRESSION; ASYMPTOTIC EQUIVALENCE; CONVERGENCE-RATES; APPROXIMATION; BOUNDS; MULTIVARIATE; SPLINES;
D O I
10.1109/TIT.2022.3208653
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We study the problem of estimating an unknown function from noisy data using shallow ReLU neural networks. The estimators we study minimize the sum of squared data-fitting errors plus a regularization term proportional to the squared Euclidean norm of the network weights. This minimization corresponds to the common approach of training a neural network with weight decay. We quantify the performance (mean-squared error) of these neural network estimators when the data-generating function belongs to the second-order Radon-domain bounded variation space. This space of functions was recently proposed as the natural function space associated with shallow ReLU neural networks. We derive a minimax lower bound for the estimation problem for this function space and show that the neural network estimators are minimax optimal up to logarithmic factors. This minimax rate is immune to the curse of dimensionality. We quantify an explicit gap between neural networks and linear methods (which include kernel methods) by deriving a linear minimax lower bound for the estimation problem, showing that linear methods necessarily suffer the curse of dimensionality in this function space. As a result, this paper sheds light on the phenomenon that neural networks seem to break the curse of dimensionality.
引用
收藏
页码:1125 / 1140
页数:16
相关论文
共 50 条
  • [31] Probabilistic Verification of ReLU Neural Networks via Characteristic Functions
    Pilipovsky, Joshua
    Sivaramakrishnan, Vignesh
    Oishi, Meeko M. K.
    Tsiotras, Panagiotis
    LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
  • [32] ReLU deep neural networks from the hierarchical basis perspective
    He, Juncai
    Li, Lin
    Xu, Jinchao
    COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2022, 120 : 105 - 114
  • [33] Hidden unit specialization in layered neural networks: ReLU vs. sigmoidal activation
    Oostwal, Elisa
    Straat, Michiel
    Biehl, Michael
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2021, 564
  • [34] Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputs
    Boursier, Etienne
    Pillaud-Vivien, Loucas
    Flammarion, Nicolas
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [35] Width is Less Important than Depth in ReLU Neural Networks
    Vardi, Gal
    Yehudai, Gilad
    Shamir, Ohad
    CONFERENCE ON LEARNING THEORY, VOL 178, 2022, 178
  • [36] Optimal Approximation with Sparsely Connected Deep Neural Networks
    Boelcskei, Helmut
    Grohs, Philipp
    Kutyniok, Gitta
    Petersen, Philipp
    SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2019, 1 (01): : 8 - 45
  • [37] Convergence Analysis of Two-layer Neural Networks with ReLU Activation
    Li, Yuanzhi
    Yuan, Yang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [38] Minimax Optimal Nonparametric Estimation of Heterogeneous Treatment Effects
    Gao, Zijun
    Han, Yanjun
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [39] On the uniform approximation estimation of deep ReLU networks via frequency decomposition
    Chen, Liang
    Liu, Wenjun
    AIMS MATHEMATICS, 2022, 7 (10): : 19018 - 19025
  • [40] EV-GAN: Simulation of extreme events with ReLU neural networks
    Allouche, Michael
    Girard, Stephane
    Gobet, Emmanuel
    JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23