Heuristic Search for Activation Functions of Neural Networks Based on Gaussian Processes

被引:0
作者
Shi, Xinxing [1 ]
Chen, Jialin [1 ]
Wang, Lingli [1 ]
机构
[1] Fudan Univ, Sch Microelect, State Key Lab ASIC & Syst, Shanghai, Peoples R China
来源
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2021年
基金
中国国家自然科学基金;
关键词
Gaussian processes; neural tangent kernel; activation function; neural networks;
D O I
10.1109/IJCNN52387.2021.9533641
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite the powerful expressivity of neural networks with nonlinear activation functions, the underlying mechanism for deep neural networks still remains unclear. However, it can be proved that ultra-wide neural networks are equivalent to Gaussian processes, thus connecting the analysis on neural networks with Bayesian statistics and kernel methods. Moreover, recent studies on infinitely wide neural networks extend this correspondence to a specific kernel, named Neural Tangent Kernel (NTK), which governs the learning dynamics of related neural networks. Without weights and biases, the NTK recursively encodes the architecture information about the corresponding neural networks, including the activation function at each hidden layer. Inspired by this close relationship of Gaussian processes and neural networks, we propose a heuristic search method for activation functions of sufficiently wide neural networks in the NTK regime. To obtain an elegant and closed-form computation, activation functions are decomposed in the basis of Hermite polynomials, which converts the kernels in Gaussian processes into power series. Experiments show the outperformance of the obtained nonlinearities compared with other common activation functions. This work also reveals the potential utility of NTKs for guidance on neural network structure search in the future.
引用
收藏
页数:8
相关论文
共 38 条
[1]  
[Anonymous], 2018, THE 32ND NEURAL INFO
[2]  
[Anonymous], 2013, PROCEEDINGS OF THE 3
[3]  
[Anonymous], 2016, ARXIV PREPRINT ARXIV
[4]  
[Anonymous], 2019, SIAM
[5]  
[Anonymous], 2017, PLOS ONE
[6]  
[Anonymous], 2014, ARXIV PREPRINT ARXIV
[7]  
[Anonymous], 2019, ADV NEURAL INFORM PR
[8]  
[Anonymous], 2020, ADV NEURAL INFORM PR
[9]  
[Anonymous], 2017, ACM, DOI DOI 10.1145/3065386
[10]  
[Anonymous], 2016, PROCEEDINGS OF THE 3