Near-Minimax Optimal Estimation With Shallow ReLU Neural Networks

被引:12
作者
Parhi, Rahul [1 ,2 ]
Nowak, Robert D. D. [1 ]
机构
[1] Univ Wisconsin Madison, Dept Elect & Comp Engn, Madison, WI 53706 USA
[2] Ecole Polytech Fed Lausanne, Biomed Imaging Grp, CH-1015 Lausanne, Switzerland
关键词
Estimation; Training; Biological neural networks; TV; Radon; Noise measurement; Neurons; Neural networks; ridge functions; sparsity; function approximation; nonparametric function estimation; NONPARAMETRIC REGRESSION; ASYMPTOTIC EQUIVALENCE; CONVERGENCE-RATES; APPROXIMATION; BOUNDS; MULTIVARIATE; SPLINES;
D O I
10.1109/TIT.2022.3208653
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We study the problem of estimating an unknown function from noisy data using shallow ReLU neural networks. The estimators we study minimize the sum of squared data-fitting errors plus a regularization term proportional to the squared Euclidean norm of the network weights. This minimization corresponds to the common approach of training a neural network with weight decay. We quantify the performance (mean-squared error) of these neural network estimators when the data-generating function belongs to the second-order Radon-domain bounded variation space. This space of functions was recently proposed as the natural function space associated with shallow ReLU neural networks. We derive a minimax lower bound for the estimation problem for this function space and show that the neural network estimators are minimax optimal up to logarithmic factors. This minimax rate is immune to the curse of dimensionality. We quantify an explicit gap between neural networks and linear methods (which include kernel methods) by deriving a linear minimax lower bound for the estimation problem, showing that linear methods necessarily suffer the curse of dimensionality in this function space. As a result, this paper sheds light on the phenomenon that neural networks seem to break the curse of dimensionality.
引用
收藏
页码:1125 / 1140
页数:16
相关论文
共 50 条
  • [21] Shallow Univariate ReLU Networks as Splines: Initialization, Loss Surface, Hessian, and Gradient Flow Dynamics
    Sahs, Justin
    Pyle, Ryan
    Damaraju, Aneel
    Caro, Josue Ortega
    Tavaslioglu, Onur
    Lu, Andy
    Anselmi, Fabio
    Patel, Ankit B.
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2022, 5
  • [22] RELU DEEP NEURAL NETWORKS AND LINEAR FINITE ELEMENTS
    He, Juncai
    Li, Lin
    Xu, Jinchao
    Zheng, Chunyue
    JOURNAL OF COMPUTATIONAL MATHEMATICS, 2020, 38 (03) : 502 - 527
  • [23] Minimax classifiers based on neural networks
    Alaiz-Rodríguez, R
    Guerrero-Curieses, A
    Cid-Sueiro, J
    PATTERN RECOGNITION, 2005, 38 (01) : 29 - 39
  • [24] DISCUSSION OF: "NONPARAMETRIC REGRESSION USING DEEP NEURAL NETWORKS WITH RELU ACTIVATION FUNCTION"
    Kutyniok, Gitta
    ANNALS OF STATISTICS, 2020, 48 (04) : 1902 - 1905
  • [25] Convergence rates for shallow neural networks learned by gradient descent
    Braun, Alina
    Kohler, Michael
    Langer, Sophie
    Walk, Harro
    BERNOULLI, 2024, 30 (01) : 475 - 502
  • [26] Optimal training of Mean Variance Estimation neural networks
    Sluijterman, Laurens
    Cator, Eric
    Heskes, Tom
    NEUROCOMPUTING, 2024, 597
  • [27] Optimal approximation rate of ReLU networks in terms of width and depth
    Shen, Zuowei
    Yang, Haizhao
    Zhang, Shijun
    JOURNAL DE MATHEMATIQUES PURES ET APPLIQUEES, 2022, 157 : 101 - 135
  • [28] Gradient Descent Provably Escapes Saddle Points in the Training of Shallow ReLU Networks
    Cheridito, Patrick
    Jentzen, Arnulf
    Rossmannek, Florian
    JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2024, 203 (03) : 2617 - 2648
  • [29] Factor Augmented Sparse Throughput Deep ReLU Neural Networks for High Dimensional Regression
    Fan, Jianqing
    Gu, Yihong
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (548) : 2680 - 2694
  • [30] On Centralization and Unitization of Batch Normalization for Deep ReLU Neural Networks
    Fei, Wen
    Dai, Wenrui
    Li, Chenglin
    Zou, Junni
    Xiong, Hongkai
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2024, 72 : 2827 - 2841