DEEP NEURAL NETWORKS WITH RELU-SINE-EXPONENTIAL ACTIVATIONS BREAK CURSE OF DIMENSIONALITY IN APPROXIMATION ON HOLDER CLASS

被引:1
作者
Jiao, Yuling [1 ,2 ]
Lai, Yanming [3 ]
Lu, Xiliang [1 ,2 ]
Wang, Fengru [3 ]
Yang, Jerry zhijian [1 ,2 ]
Yang, Yuanyuan [3 ]
机构
[1] Wuhan Univ, Sch Math & Stat, Wuhan 430072, Peoples R China
[2] Wuhan Univ, Hubei Key Lab Computat Sci, Wuhan 430072, Peoples R China
[3] Wuhan Univ, Sch Math & Stat, Wuhan 430072, Peoples R China
基金
美国国家科学基金会;
关键词
deep neural network; curse of dimensionality; approximation; Holder continuous function; ERROR-BOUNDS;
D O I
10.1137/21M144431X
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
In this paper, we construct neural networks with ReLU, sine, and 2xas activationfunctions. For a general continuousfdefined on [0,1]dwith continuity modulus\omega f(\cdot ), we constructReLU-sine-2xnetworks that enjoy an approximation rate\scrO (\omega f(\surd d)\cdot 2 - M+\omega f(\surd dN)), whereM,N\in \BbbN +are the hyperparameters related to widths of the networks. As a consequence, we can constructReLU-sine-2xnetwork with the depth 6 and width max\{ 2d\lceil log2(\surd d(3\mu \epsilon )1/\alpha )\rceil ,. .2\lceil log23\mu d\alpha /22\epsilon \rceil + 2\} that approximatesf\in \scrH \alpha \mu ([0,1]d) within a given tolerance\epsilon >0 measured in theLpnorm withp\in [1,\infty ), where\scrH \alpha \mu ([0,1]d) denotes the H\"older continuous function class defined on [0,1]dwithorder\alpha \in (0,1] and constant\mu >0. Therefore, the ReLU-sine-2xnetworks overcome the curseof dimensionality in an approximation on\scrH \alpha \mu ([0,1]d). In addition to its super expressive power,functions implemented by ReLU-sine-2xnetworks are (generalized) differentiable, enabling us toapply stochastic gradient descent to train.
引用
收藏
页码:3635 / 3649
页数:15
相关论文
共 42 条
  • [1] Anthony M., 2009, NEURAL NETWORK LEARN
  • [2] Ba J. L., 2015, INT C LEARNING REPRE
  • [3] UNIVERSAL APPROXIMATION BOUNDS FOR SUPERPOSITIONS OF A SIGMOIDAL FUNCTION
    BARRON, AR
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1993, 39 (03) : 930 - 945
  • [4] Representation Learning: A Review and New Perspectives
    Bengio, Yoshua
    Courville, Aaron
    Vincent, Pascal
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) : 1798 - 1828
  • [5] Berner J, 2019, 2019 13TH INTERNATIONAL CONFERENCE ON SAMPLING THEORY AND APPLICATIONS (SAMPTA)
  • [6] Chen M., 2019, Advances in Neural Information Processing Systems (NeurIPS)
  • [7] Chui C. K., 2018, Frontiers in Applied Mathematics and Statistics, V4, P12, DOI DOI 10.3389/fams.2018.00012
  • [8] Clarke F.H., 1990, OPTIMIZATION NONSMOO
  • [9] Cybenko G., 1989, Mathematics of Control, Signals, and Systems, V2, P303, DOI 10.1007/BF02551274
  • [10] Donoho D. L., 2000, AMS MATH CHALLENGES, V1, P32