Energy Conservation in Infinitely Wide Neural-Networks

被引：0

作者：

Eguchi, Shu ^{[1
]}

Amaba, Takafumi ^{[1
]}

机构：

[1] Fukuoka Univ, Jonan Ku, 8-19-1 Nanakuma, Fukuoka 8140180, Japan

来源：

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT IV | 2021年 / 12894卷

关键词：

Wide neural-networks; Cumulative sum of parameters; Energy conservation;

D O I：

10.1007/978-3-030-86380-7_15

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A three-layered neural-network (NN), which consists of an input layer, a wide hidden layer and an output layer, has three types of parameters. Two of them are pre-neuronal, namely, thresholds and weights to be applied to input data. The rest is post-neuronal weights to be applied after activation. The current paper consists of the following two parts. First, we consider three types of stochastic processes. They are constructed by summing up each of parameters over all neurons at each epoch, respectively. The neuron number will be regarded as another time different to epochs. In the wide neural-network with a neural-tangent-kernel- (NTK-) parametrization, it is well known that these parameters are hardly varied from their initial values during learning. We show that, however, the stochastic process associated with the post-neuronal parameters is actually varied during the learning while the stochastic processes associated with the pre-neuronal parameters are not. By our result, we can distinguish the type of parameters by focusing on those stochastic processes. Second, we show that the variance (sort of "energy") of the parameters in the infinitely wide neural-network is conserved during the learning, and thus it gives a conserved quantity in learning.

引用

页码：177 / 189

页数：13

共 12 条

[1]

[Anonymous], 1989, STOCHASTIC DIFFERENT

[2]

[Anonymous], 2018, Advances in neural information processing systems

[3]

Goldberg PW, 1998, ADV NEUR IN, V10, P493

[4]

Jacot A., 2018, P 32 INT C NEUR INF

[5] Wide neural networks of any depth evolve as linear models under gradient descent* [J].

Lee, Jaehoon ;

Xiao, Lechao ;

Schoenholz, Samuel S. ;

Bahri, Yasaman ;

Novak, Roman ;

Sohl-Dickstein, Jascha ;

Pennington, Jeffrey .

JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2020, 2020 (12)

[6]

Lee Jaehoon, 2018, INT C LEARN REPR

[7] An integral representation of functions using three-layered networks and their approximation bounds [J].

Murata, N .

NEURAL NETWORKS, 1996, 9 (06) :947-956

[8]

Neal M., 1996, Bayesian learning for neural networks

[9] Neural network with unbounded activation functions is universal approximator [J].

Sonoda, Sho ;

Murata, Noboru .

APPLIED AND COMPUTATIONAL HARMONIC ANALYSIS, 2017, 43 (02) :233-268

[10]

Suzuki T., 2021, INT C LEARN REPR, P1

← 1 2 →