Near-Minimax Optimal Estimation With Shallow ReLU Neural Networks

被引：12

作者：

Parhi, Rahul ^{[1
,2
]}

Nowak, Robert D. D. ^{[1
]}

机构：

[1] Univ Wisconsin Madison, Dept Elect & Comp Engn, Madison, WI 53706 USA

[2] Ecole Polytech Fed Lausanne, Biomed Imaging Grp, CH-1015 Lausanne, Switzerland

来源：

IEEE TRANSACTIONS ON INFORMATION THEORY | 2023年 / 69卷 / 02期

关键词：

Estimation; Training; Biological neural networks; TV; Radon; Noise measurement; Neurons; Neural networks; ridge functions; sparsity; function approximation; nonparametric function estimation; NONPARAMETRIC REGRESSION; ASYMPTOTIC EQUIVALENCE; CONVERGENCE-RATES; APPROXIMATION; BOUNDS; MULTIVARIATE; SPLINES;

D O I：

10.1109/TIT.2022.3208653

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We study the problem of estimating an unknown function from noisy data using shallow ReLU neural networks. The estimators we study minimize the sum of squared data-fitting errors plus a regularization term proportional to the squared Euclidean norm of the network weights. This minimization corresponds to the common approach of training a neural network with weight decay. We quantify the performance (mean-squared error) of these neural network estimators when the data-generating function belongs to the second-order Radon-domain bounded variation space. This space of functions was recently proposed as the natural function space associated with shallow ReLU neural networks. We derive a minimax lower bound for the estimation problem for this function space and show that the neural network estimators are minimax optimal up to logarithmic factors. This minimax rate is immune to the curse of dimensionality. We quantify an explicit gap between neural networks and linear methods (which include kernel methods) by deriving a linear minimax lower bound for the estimation problem, showing that linear methods necessarily suffer the curse of dimensionality in this function space. As a result, this paper sheds light on the phenomenon that neural networks seem to break the curse of dimensionality.

引用

页码：1125 / 1140

页数：16

共 50 条

[21] Shallow Univariate ReLU Networks as Splines: Initialization, Loss Surface, Hessian, and Gradient Flow Dynamics
Sahs, Justin
Pyle, Ryan
Damaraju, Aneel
Caro, Josue Ortega
Tavaslioglu, Onur
Lu, Andy
Anselmi, Fabio
Patel, Ankit B.
FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2022, 5
[22] RELU DEEP NEURAL NETWORKS AND LINEAR FINITE ELEMENTS
He, Juncai
Li, Lin
Xu, Jinchao
Zheng, Chunyue
JOURNAL OF COMPUTATIONAL MATHEMATICS, 2020, 38 (03) : 502 - 527
[23] Minimax classifiers based on neural networks
Alaiz-Rodríguez, R
Guerrero-Curieses, A
Cid-Sueiro, J
PATTERN RECOGNITION, 2005, 38 (01) : 29 - 39
[24] DISCUSSION OF: "NONPARAMETRIC REGRESSION USING DEEP NEURAL NETWORKS WITH RELU ACTIVATION FUNCTION"
Kutyniok, Gitta
ANNALS OF STATISTICS, 2020, 48 (04) : 1902 - 1905
[25] Convergence rates for shallow neural networks learned by gradient descent
Braun, Alina
Kohler, Michael
Langer, Sophie
Walk, Harro
BERNOULLI, 2024, 30 (01) : 475 - 502
[26] Optimal training of Mean Variance Estimation neural networks
Sluijterman, Laurens
Cator, Eric
Heskes, Tom
NEUROCOMPUTING, 2024, 597
[27] Optimal approximation rate of ReLU networks in terms of width and depth
Shen, Zuowei
Yang, Haizhao
Zhang, Shijun
JOURNAL DE MATHEMATIQUES PURES ET APPLIQUEES, 2022, 157 : 101 - 135
[28] Gradient Descent Provably Escapes Saddle Points in the Training of Shallow ReLU Networks
Cheridito, Patrick
Jentzen, Arnulf
Rossmannek, Florian
JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2024, 203 (03) : 2617 - 2648
[29] Factor Augmented Sparse Throughput Deep ReLU Neural Networks for High Dimensional Regression
Fan, Jianqing
Gu, Yihong
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (548) : 2680 - 2694
[30] On Centralization and Unitization of Batch Normalization for Deep ReLU Neural Networks
Fei, Wen
Dai, Wenrui
Li, Chenglin
Zou, Junni
Xiong, Hongkai
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2024, 72 : 2827 - 2841

← 1 2 3 4 5 →