On size-independent sample complexity of ReLU networks

被引：0

作者：

Sellke, Mark ^{[1
]}

机构：

[1] Harvard Stat, Cambridge, MA USA

来源：

INFORMATION PROCESSING LETTERS | 2024年 / 186卷

关键词：

Neural networks; Rademacher complexity; Generalization; Theory of computation;

D O I：

10.1016/j.ipl.2024.106482

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We study the sample complexity of learning ReLU neural networks from the point of view of generalization. Given norm constraints on the weight matrices, a common approach is to estimate the Rademacher complexity of the associated function class. Previously [9] obtained a bound independent of the network size (scaling with a product of Frobenius norms) except for a factor of the square -root depth. We give a refinement which often has no explicit depth -dependence at all.

引用

页数：3

共 50 条

[31] On the uniform approximation estimation of deep ReLU networks via frequency decomposition
Chen, Liang
Liu, Wenjun
AIMS MATHEMATICS, 2022, 7 (10): : 19018 - 19025
[32] Gradient descent optimizes over-parameterized deep ReLU networks
Zou, Difan
Cao, Yuan
Zhou, Dongruo
Gu, Quanquan
MACHINE LEARNING, 2020, 109 (03) : 467 - 492
[33] New Error Bounds for Deep ReLU Networks Using Sparse Grids
Montanelli, Hadrien
Du, Qiang
SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2019, 1 (01): : 78 - 92
[34] Deep ReLU networks and high-order finite element methods
Opschoor, Joost A. A.
Petersen, Philipp C.
Schwab, Christoph
ANALYSIS AND APPLICATIONS, 2020, 18 (05) : 715 - 770
[35] Approximate spectral decomposition of Fisher information matrix for simple ReLU networks
Takeishi, Yoshinari
Iida, Masazumi
Takeuchi, Jun'ichi
NEURAL NETWORKS, 2023, 164 : 691 - 706
[36] Training Two-Layer ReLU Networks with Gradient Descent is Inconsistent
Holzmueller, David
Steinwart, Ingo
JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
[37] Efficient Approximation of Deep ReLU Networks for Functions on Low Dimensional Manifolds
Chen, Minshuo
Jiang, Haoming
Liao, Wenjing
Zhao, Tuo
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[38] ReLU networks as surrogate models in mixed-integer linear programs
Grimstad, Bjarne
Andersson, Henrik
COMPUTERS & CHEMICAL ENGINEERING, 2019, 131
[39] PLATEAU PHENOMENON IN GRADIENT DESCENT TRAINING OF RELU NETWORKS: EXPLANATION, QUANTIFICATION, AND AVOIDANCE
Ainsworth, Mark
Shin, Yeonjong
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2021, 43 (05) : A3438 - A3468
[40] Gradient Descent Provably Escapes Saddle Points in the Training of Shallow ReLU Networks
Cheridito, Patrick
Jentzen, Arnulf
Rossmannek, Florian
JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2024, 203 (03) : 2617 - 2648

← 1 2 3 4 5 →