The Sample Complexity of One-Hidden-Layer Neural Networks

被引：0

作者：

Vardi, Gal ^{[1
,2
,3
]}

Shamir, Ohad ^{[3
]}

Srebro, Nathan ^{[1
]}

机构：

[1] TTI Chicago, Chicago, IL 60637 USA

[2] Hebrew Univ Jerusalem, Jerusalem, Israel

[3] Weizmann Inst Sci, Rehovot, Israel

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022 | 2022年

基金：

欧洲研究理事会;

关键词：

BOUNDS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study norm-based uniform convergence bounds for neural networks, aiming at a tight understanding of how these are affected by the architecture and type of norm constraint, for the simple class of scalar-valued one-hidden-layer networks, and inputs bounded in Euclidean norm. We begin by proving that in general, controlling the spectral norm of the hidden layer weight matrix is insufficient to get uniform convergence guarantees (independent of the network width), while a stronger Frobenius norm control is sufficient, extending and improving on previous work. Motivated by the proof constructions, we identify and analyze two important settings where (perhaps surprisingly) a mere spectral norm control turns out to be sufficient: First, when the network's activation functions are sufficiently smooth (with the result extending to deeper networks); and second, for certain types of convolutional networks. In the latter setting, we study how the sample complexity is additionally affected by parameters such as the amount of overlap between patches and the overall number of patches.

引用

页数：12

共 50 条

[31] On the Sample Complexity for Nonoverlapping Neural Networks
Michael Schmitt
Machine Learning, 1999, 37 : 131 - 141
[32] A Low-complexity Visual Tracking Approach with Single Hidden Layer Neural Networks
Dai, Liang
Zhu, Yuesheng
Luo, Guibo
He, Chao
2014 13TH INTERNATIONAL CONFERENCE ON CONTROL AUTOMATION ROBOTICS & VISION (ICARCV), 2014, : 810 - 814
[33] Simultaneous approximations of multivariate functions and their derivatives by neural networks with one hidden layer
Li, X
NEUROCOMPUTING, 1996, 12 (04) : 327 - 343
[34] Neural Networks with Marginalized Corrupted Hidden Layer
Li, Yanjun
Xin, Xin
Guo, Ping
NEURAL INFORMATION PROCESSING, PT III, 2015, 9491 : 506 - 514
[35] Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima
Du, Simon S.
Lee, Jason D.
Tian, Yuandong
Poczos, Barnabas
Singh, Aarti
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[36] Size-independent sample complexity of neural networks
Golowich, Noah
Rakhlin, Alexander
Shamir, Ohad
INFORMATION AND INFERENCE-A JOURNAL OF THE IMA, 2020, 9 (02) : 473 - 504
[37] Minimax Lower Bounds for Transfer Learning with Linear and One-hidden Layer Neural Networks
Kalan, Seyed Mohammadreza Mousavi
Fabian, Zalan
Avestimehr, Salman
Soltanolkotabi, Mahdi
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[38] Neural networks for word recognition: Is a hidden layer necessary?
Dandurand, Frederic
Hannagan, Thomas
Grainger, Jonathan
COGNITION IN FLUX, 2010, : 688 - 693
[39] Regularization of hidden layer unit response for neural networks
Taga, K
Kameyama, K
Toraichi, K
2003 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS, AND SIGNAL PROCESSING, VOLS 1 AND 2, CONFERENCE PROCEEDINGS, 2003, : 348 - 351
[40] HOW TO DETERMINE THE STUCTRUE OF THE HIDDEN LAYER IN NEURAL NETWORKS
魏强
张士军
张勇传
水电能源科学, 1997, (01) : 18 - 22

← 1 2 3 4 5 →