Proof of the Theory-to-Practice Gap in Deep Learning via Sampling Complexity bounds for Neural Network Approximation Spaces

被引：12

作者：

Grohs, Philipp ^{[1
,2
,3
]}

Voigtlaender, Felix ^{[1
,4
,5
]}

机构：

[1] Univ Vienna, Fac Math, Oskar Morgenstern Pl 1, A-1090 Vienna, Austria

[2] Res Platform Data Sci Uni Vienna, Wahringer Str 29-S6, A-1090 Vienna, Austria

[3] Johann Radon Inst, Altenberger Str 69, A-4040 Linz, Austria

[4] Tech Univ Munich, Dept Math, Boltzmannstr 3, D-85748 Garching, Germany

[5] Catholic Univ Eichstatt Ingolstadt KU, Math Inst Machine Learning & Data Sci MIDS, Auf Der Schanz 49, D-85049 Ingolstadt, Germany

来源：

FOUNDATIONS OF COMPUTATIONAL MATHEMATICS | 2024年 / 24卷 / 04期

关键词：

Deep neural networks; Approximation spaces; Information based complexity; Gelfand numbers; Theory-to-computational gaps; Randomized approximation; GAME; GO;

D O I：

10.1007/s10208-023-09607-w

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

We study the computational complexity of (deterministic or randomized) algorithms based on point samples for approximating or integrating functions that can be well approximated by neural networks. Such algorithms (most prominently stochastic gradient descent and its variants) are used extensively in the field of deep learning. One of the most important problems in this field concerns the question of whether it is possible to realize theoretically provable neural network approximation rates by such algorithms. We answer this question in the negative by proving hardness results for the problems of approximation and integration on a novel class of neural network approximation spaces. In particular, our results confirm a conjectured and empirically observed theory-to-practice gap in deep learning. We complement our hardness results by showing that error bounds of a comparable order of convergence are (at least theoretically) achievable.

引用

页码：1085 / 1143

页数：59

共 53 条

[31]

Lample G., 2019, INT C LEARNING REPRE

[32] Gradient-based learning applied to document recognition [J].

Lecun, Y ;

Bottou, L ;

Bengio, Y ;

Haffner, P .

PROCEEDINGS OF THE IEEE, 1998, 86 (11) :2278-2324

[33]

M M., 1991, THEORY ORLICZ SPACES, V146

[34] Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships [J].

Ma, Junshui ;

Sheridan, Robert P. ;

Liaw, Andy ;

Dahl, George E. ;

Svetnik, Vladimir .

JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2015, 55 (02) :263-274

[35] Neurobiology and consequences of social isolation stress in animal model-A comprehensive review [J].

Mumtaz, Faiza ;

Khan, Muhammad Imran ;

Zubair, Muhammad ;

Dehpour, Ahmad Reza .

BIOMEDICINE & PHARMACOTHERAPY, 2018, 105 :1205-1222

[36] Optimal approximation of piecewise smooth functions using deep ReLU neural networks [J].

Petersen, Philipp ;

Voigtlaender, Felix .

NEURAL NETWORKS, 2018, 108 :296-330

[37] Ab initio solution of the many-electron Schrodinger equation with deep neural networks [J].

Pfau, David ;

Spencer, James S. ;

Matthews, Alexander G. D. G. ;

Foulkes, W. M. C. .

PHYSICAL REVIEW RESEARCH, 2020, 2 (03)

[38]

Pietsch A., 1987, EIGENVALUES S NUMBER

[39]

PINKUS A., 2012, n-Widths in Approximation Theory

[40] Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations [J].

Raissi, M. ;

Perdikaris, P. ;

Karniadakis, G. E. .

JOURNAL OF COMPUTATIONAL PHYSICS, 2019, 378 :686-707

← 1 2 3 4 5 6 →