Near-Minimax Optimal Estimation With Shallow ReLU Neural Networks

被引：12

作者：

Parhi, Rahul ^{[1
,2
]}

Nowak, Robert D. D. ^{[1
]}

机构：

[1] Univ Wisconsin Madison, Dept Elect & Comp Engn, Madison, WI 53706 USA

[2] Ecole Polytech Fed Lausanne, Biomed Imaging Grp, CH-1015 Lausanne, Switzerland

来源：

IEEE TRANSACTIONS ON INFORMATION THEORY | 2023年 / 69卷 / 02期

关键词：

Estimation; Training; Biological neural networks; TV; Radon; Noise measurement; Neurons; Neural networks; ridge functions; sparsity; function approximation; nonparametric function estimation; NONPARAMETRIC REGRESSION; ASYMPTOTIC EQUIVALENCE; CONVERGENCE-RATES; APPROXIMATION; BOUNDS; MULTIVARIATE; SPLINES;

D O I：

10.1109/TIT.2022.3208653

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We study the problem of estimating an unknown function from noisy data using shallow ReLU neural networks. The estimators we study minimize the sum of squared data-fitting errors plus a regularization term proportional to the squared Euclidean norm of the network weights. This minimization corresponds to the common approach of training a neural network with weight decay. We quantify the performance (mean-squared error) of these neural network estimators when the data-generating function belongs to the second-order Radon-domain bounded variation space. This space of functions was recently proposed as the natural function space associated with shallow ReLU neural networks. We derive a minimax lower bound for the estimation problem for this function space and show that the neural network estimators are minimax optimal up to logarithmic factors. This minimax rate is immune to the curse of dimensionality. We quantify an explicit gap between neural networks and linear methods (which include kernel methods) by deriving a linear minimax lower bound for the estimation problem, showing that linear methods necessarily suffer the curse of dimensionality in this function space. As a result, this paper sheds light on the phenomenon that neural networks seem to break the curse of dimensionality.

引用

页码：1125 / 1140

页数：16

共 50 条

[31] Probabilistic Verification of ReLU Neural Networks via Characteristic Functions
Pilipovsky, Joshua
Sivaramakrishnan, Vignesh
Oishi, Meeko M. K.
Tsiotras, Panagiotis
LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
[32] ReLU deep neural networks from the hierarchical basis perspective
He, Juncai
Li, Lin
Xu, Jinchao
COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2022, 120 : 105 - 114
[33] Hidden unit specialization in layered neural networks: ReLU vs. sigmoidal activation
Oostwal, Elisa
Straat, Michiel
Biehl, Michael
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2021, 564
[34] Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputs
Boursier, Etienne
Pillaud-Vivien, Loucas
Flammarion, Nicolas
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[35] Width is Less Important than Depth in ReLU Neural Networks
Vardi, Gal
Yehudai, Gilad
Shamir, Ohad
CONFERENCE ON LEARNING THEORY, VOL 178, 2022, 178
[36] Optimal Approximation with Sparsely Connected Deep Neural Networks
Boelcskei, Helmut
Grohs, Philipp
Kutyniok, Gitta
Petersen, Philipp
SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2019, 1 (01): : 8 - 45
[37] Convergence Analysis of Two-layer Neural Networks with ReLU Activation
Li, Yuanzhi
Yuan, Yang
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[38] Minimax Optimal Nonparametric Estimation of Heterogeneous Treatment Effects
Gao, Zijun
Han, Yanjun
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[39] On the uniform approximation estimation of deep ReLU networks via frequency decomposition
Chen, Liang
Liu, Wenjun
AIMS MATHEMATICS, 2022, 7 (10): : 19018 - 19025
[40] EV-GAN: Simulation of extreme events with ReLU neural networks
Allouche, Michael
Girard, Stephane
Gobet, Emmanuel
JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23

← 1 2 3 4 5 →