Concentration Inequalities and Optimal Number of Layers for Stochastic Deep Neural Networks

被引：1

作者：

Caprio, Michele ^{[1
]}

Mukherjee, Sayan ^{[2
,3
,4
,5
,6
,7
]}

机构：

[1] Univ Penn, PRECISE Ctr, Dept Comp & Informat Sci, Philadelphia, PA 19104 USA

[2] Univ Leipzig, Ctr Scalable Data Analyt & Artificial Intelligence, D-04105 Leipzig, Germany

[3] Max Planck Inst Math Sci, D-04103 Leipzig, Germany

[4] Duke Univ, Dept Stat Sci, Durham, NC 27708 USA

[5] Duke Univ, Dept Math, Durham, NC 27708 USA

[6] Duke Univ, Dept Comp Sci, Durham, NC 27708 USA

[7] Duke Univ, Dept Biostat & Bioinformat, Durham, NC 27708 USA

来源：

IEEE ACCESS | 2023年 / 11卷

关键词：

Stochastic processes; Deep learning; Biological neural networks; Artificial neural networks; Neurons; Natural language processing; Stochastic deep neural network; feedforward neural network; ReLU activation; concentration inequality; martingales; optimal stopping; optimal number of layers; MODELS;

D O I：

10.1109/ACCESS.2023.3268034

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We state concentration inequalities for the output of the hidden layers of a stochastic deep neural network (SDNN), as well as for the output of the whole SDNN. These results allow us to introduce an expected classifier (EC), and to give probabilistic upper bound for the classification error of the EC. We also state the optimal number of layers for the SDNN via an optimal stopping procedure. We apply our analysis to a stochastic version of a feedforward neural network with ReLU activation function.

引用

页码：38458 / 38470

页数：13

共 49 条

[1]

Alfarra M, 2022, Arxiv, DOI arXiv:2002.08838

[2] The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems [J].

An, LTH ;

Tao, PD .

ANNALS OF OPERATIONS RESEARCH, 2005, 133 (1-4) :23-46

[3]

[Anonymous], 2017, Commun. ACM

[4]

Arora R., 2018, Proceedings of the International Conference on Learning Representations, P1, DOI DOI 10.1145/3173574.3173759

[5]

Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473

[6]

Bishop Christopher M., 2006, Pattern recognition and machine learning, DOI [10.1007/978-0-387-45528-0, DOI 10.1007/978-0-387-45528-0]

[7]

Boyd Stephen., 2004, Convex Optimization, V1st, P727

[8]

Burgin M., 2020, Non-Diophantine Arithmetics in Mathematics, Physics, and Psychology

[9] Concerning three classes of non-Diophantine arithmetics [J].

Caprio, Michele ;

Aveni, Andrea ;

Mukherjee, Sayan .

INVOLVE, A JOURNAL OF MATHEMATICS, 2022, 15 (05) :763-774

[10]

Caprio M, 2024, Arxiv, DOI arXiv:2302.09656

← 1 2 3 4 5 →