DEEP NEURAL NETWORKS WITH RELU-SINE-EXPONENTIAL ACTIVATIONS BREAK CURSE OF DIMENSIONALITY IN APPROXIMATION ON HOLDER CLASS

被引：1

作者：

Jiao, Yuling ^{[1
,2
]}

Lai, Yanming ^{[3
]}

Lu, Xiliang ^{[1
,2
]}

Wang, Fengru ^{[3
]}

Yang, Jerry zhijian ^{[1
,2
]}

Yang, Yuanyuan ^{[3
]}

机构：

[1] Wuhan Univ, Sch Math & Stat, Wuhan 430072, Peoples R China

[2] Wuhan Univ, Hubei Key Lab Computat Sci, Wuhan 430072, Peoples R China

[3] Wuhan Univ, Sch Math & Stat, Wuhan 430072, Peoples R China

来源：

SIAM JOURNAL ON MATHEMATICAL ANALYSIS | 2023年 / 55卷 / 04期

基金：

美国国家科学基金会;

关键词：

deep neural network; curse of dimensionality; approximation; Holder continuous function; ERROR-BOUNDS;

D O I：

10.1137/21M144431X

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

In this paper, we construct neural networks with ReLU, sine, and 2xas activationfunctions. For a general continuousfdefined on [0,1]dwith continuity modulus\omega f(\cdot ), we constructReLU-sine-2xnetworks that enjoy an approximation rate\scrO (\omega f(\surd d)\cdot 2 - M+\omega f(\surd dN)), whereM,N\in \BbbN +are the hyperparameters related to widths of the networks. As a consequence, we can constructReLU-sine-2xnetwork with the depth 6 and width max\{ 2d\lceil log2(\surd d(3\mu \epsilon )1/\alpha )\rceil ,. .2\lceil log23\mu d\alpha /22\epsilon \rceil + 2\} that approximatesf\in \scrH \alpha \mu ([0,1]d) within a given tolerance\epsilon >0 measured in theLpnorm withp\in [1,\infty ), where\scrH \alpha \mu ([0,1]d) denotes the H\"older continuous function class defined on [0,1]dwithorder\alpha \in (0,1] and constant\mu >0. Therefore, the ReLU-sine-2xnetworks overcome the curseof dimensionality in an approximation on\scrH \alpha \mu ([0,1]d). In addition to its super expressive power,functions implemented by ReLU-sine-2xnetworks are (generalized) differentiable, enabling us toapply stochastic gradient descent to train.

引用

页码：3635 / 3649

页数：15

共 42 条

[1] Anthony M., 2009, NEURAL NETWORK LEARN
[2] Ba J. L., 2015, INT C LEARNING REPRE
[3] UNIVERSAL APPROXIMATION BOUNDS FOR SUPERPOSITIONS OF A SIGMOIDAL FUNCTION
BARRON, AR
[J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1993, 39 (03) : 930 - 945
[4] Representation Learning: A Review and New Perspectives
Bengio, Yoshua
Courville, Aaron
Vincent, Pascal
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) : 1798 - 1828
[5] Berner J, 2019, 2019 13TH INTERNATIONAL CONFERENCE ON SAMPLING THEORY AND APPLICATIONS (SAMPTA)
[6] Chen M., 2019, Advances in Neural Information Processing Systems (NeurIPS)
[7] Chui C. K., 2018, Frontiers in Applied Mathematics and Statistics, V4, P12, DOI DOI 10.3389/fams.2018.00012
[8] Clarke F.H., 1990, OPTIMIZATION NONSMOO
[9] Cybenko G., 1989, Mathematics of Control, Signals, and Systems, V2, P303, DOI 10.1007/BF02551274
[10] Donoho D. L., 2000, AMS MATH CHALLENGES, V1, P32

← 1 2 3 4 5 →