Efficient identification of wide shallow neural networks with biases

被引:0
作者
Fornasier, Massimo [1 ]
Klock, Timo [2 ]
Mondelli, Marco [3 ]
Rauchensteiner, Michael [1 ]
机构
[1] Dept Math, Bolzmannstr 3, D-85748 Garching, Germany
[2] Deeptech Consulting, Oslo, Norway
[3] IST Austria, Campus 1, A-3400 Klosterneuburg, Austria
关键词
RIDGE FUNCTIONS; APPROXIMATION; CAPACITY; WEIGHTS;
D O I
10.1016/j.acha.2025.101749
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
The identification of the parameters of a neural network from finite samples of input-output pairs is often referred to as the teacher-student model, and this model has represented a popular framework for understanding training and generalization. Even if the problem is NP-complete in the worst case, a rapidly growing literature - after adding suitable distributional assumptions - has established finite sample identification of two-layer networks with a number of neurons m = (O(D), D being the input dimension. For the range D <m < D-2 the problem becomes harder, and truly little is known for networks parametrized by biases as well. This paper fills the gap by providing efficient algorithms and rigorous theoretical guarantees of finite sample identification for such wider shallow networks with biases. Our approach is based on a two-step pipeline: first, we recover the direction of the weights, by exploiting second order information; next, we identify the signs by suitable algebraic evaluations, and we recover the biases by empirical risk minimization via gradient descent. Numerical results demonstrate the effectiveness of our approach.
引用
收藏
页数:36
相关论文
共 65 条
[1]  
Albertini F., 1993, Artificial Neural Networks for Speech and Vision, P115
[2]  
Allen-Zhu Z, 2019, PR MACH LEARN RES, V97
[3]  
Arora S., 2018, INT C LEARNING REPRE
[4]  
Arora Sanjeev, 2019, Advances in Neural Information Processing Systems, V32
[5]  
Auer P, 1996, ADV NEUR IN, V8, P316
[6]   Learning deep linear neural networks: Riemannian gradient flows and convergence to global minimizers [J].
Bah, Bubacarr ;
Rauhut, Holger ;
Terstiege, Ulrich ;
Westdickenberg, Michael .
INFORMATION AND INFERENCE-A JOURNAL OF THE IMA, 2022, 11 (01) :307-353
[7]   ON MAJORIZATION AND SCHUR PRODUCTS [J].
BAPAT, RB ;
SUNDER, VS .
LINEAR ALGEBRA AND ITS APPLICATIONS, 1985, 72 (DEC) :107-117
[8]   TRAINING A 3-NODE NEURAL NETWORK IS NP-COMPLETE [J].
BLUM, AL ;
RIVEST, RL .
NEURAL NETWORKS, 1992, 5 (01) :117-127
[9]  
Bombari S, 2022, ADV NEUR IN
[10]  
Brutzkus A, 2017, PR MACH LEARN RES, V70