On the Universally Optimal Activation Function for a Class of Residual Neural Networks

被引:2
作者
Zhao, Feng [1 ]
Huang, Shao-Lun [2 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, Beijing 100089, Peoples R China
[2] Tsinghua Univ, Tsinghua Berkeley Shenzhen Inst, Shenzhen 518000, Peoples R China
来源
APPLIEDMATH | 2022年 / 2卷 / 04期
基金
国家重点研发计划;
关键词
activation function; function approximation; Hermite polynomials; perturbation analysis; NUMBER;
D O I
10.3390/appliedmath2040033
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
While non-linear activation functions play vital roles in artificial neural networks, it is generally unclear how the non-linearity can improve the quality of function approximations. In this paper, we present a theoretical framework to rigorously analyze the performance gain of using non-linear activation functions for a class of residual neural networks (ResNets). In particular, we show that when the input features for the ResNet are uniformly chosen and orthogonal to each other, using non-linear activation functions to generate the ResNet output averagely outperforms using linear activation functions, and the performance gain can be explicitly computed. Moreover, we show that when the activation functions are chosen as polynomials with the degree much less than the dimension of the input features, the optimal activation functions can be precisely expressed in the form of Hermite polynomials. This demonstrates the role of Hermite polynomials in function approximations of ResNets.
引用
收藏
页码:574 / 584
页数:11
相关论文
共 13 条
[1]  
Basu A., 2016, Understanding deep neural networks with rectified linear units
[2]  
Cybenko G., 1989, Mathematics of Control, Signals, and Systems, V2, P303, DOI 10.1007/BF02551274
[3]  
dlmf.nist, NIST Digital Library of Mathematical Functions
[4]  
Easton M.L., 1989, Group Invariance in Applications in Statistics, V1, P100, DOI 10.1214/cbms/1462061037
[5]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[7]   Closed determination of the number of neurons in the hidden layer of a multi-layered perceptron network [J].
Kuri-Morales, Angel .
SOFT COMPUTING, 2017, 21 (03) :597-609
[8]   Constructive feedforward neural networks using hermite polynomial activation functions [J].
Ma, LY ;
Khorasani, K .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2005, 16 (04) :821-833
[9]  
Ramachandran P, 2017, Arxiv, DOI [arXiv:1710.05941, 10.48550/arXiv.1710.05941, DOI 10.48550/ARXIV.1710.05941]
[10]  
Dey SS, 2019, Arxiv, DOI [arXiv:1810.03592, DOI 10.1109/TSP.2020.3039360, 10.1109/TSP.2020.3039360]