On the Exact Computation of Linear Frequency Principle Dynamics and Its Generalization

被引：2

作者：

Luo, Tao ^{[1
,2
]}

Ma, Zheng ^{[1
,2
]}

Xu, Zhi-Qin John ^{[2
,3
]}

Zhang, Yaoyu ^{[2
,3
,4
]}

机构：

[1] Shanghai Jiao Tong Univ, MOE LSC, Inst Nat Sci, Sch Math Sci,CMA Shanghai, Shanghai 200240, Peoples R China

[2] Shanghai Jiao Tong Univ, Qing Yuan Res Inst, Shanghai 200240, Peoples R China

[3] Shanghai Jiao Tong Univ, MOE LSC, Inst Nat Sci, Sch Math Sci, Shanghai 200240, Peoples R China

[4] Shanghai Ctr Brain Sci & Brain Inspired Technol, Shanghai 200240, Peoples R China

来源：

SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE | 2022年 / 4卷 / 04期

基金：

国家重点研发计划; 中国国家自然科学基金; 上海市自然科学基金;

关键词：

deep learning; frequency principle; Fourier analysis; two-layer neural network; neural tangent kernel; optimization; DEEP NEURAL-NETWORK; GENERALIZATION ERROR;

D O I：

10.1137/21M1444400

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

Recent works show the intriguing phenomenon of the frequency principle (F-Principle) that deep neural networks (DNNs) fit the target function from low to high frequency during training, which provides insight into the training and generalization behavior of DNNs in complex tasks. In this paper, through analysis of an infinite-width two-layer NN in the neural tangent kernel regime, we derive the exact differential equation, namely the linear frequency-principle (LFP) model, governing the evolution of NN output function in the frequency domain during training. Our exact computation applies for general activation functions with no assumption on size and distribution of training data. This LFP model unravels that higher frequencies evolve polynomially or exponentially slower than lower frequencies depending on the smoothness/regularity of the activation function. We further bridge the gap between training dynamics and generalization by proving that the LFP model implicitly minimizes a frequency-principle norm (FP-norm) of the learned function, by which higher frequencies are more severely penalized depending on the inverse of their evolution rate. Finally, we derive an a priori generalization error bound controlled by the FP-norm of the target function, which provides a theoretical justification for the empirical results that DNNs often generalize well for low-frequency functions.

引用

页码：1272 / 1292

页数：21

共 30 条

[1] [Anonymous], 2019, ADV NEUR IN
[2] Arora S, 2019, Arxiv, DOI arXiv:1901.08584
[3] Arpit D, 2017, PR MACH LEARN RES, V70
[4] Basri R, 2019, ADV NEUR IN, V32
[5] Basri R, 2020, Arxiv, DOI arXiv:2003.04560
[6] Biland S, 2019, Arxiv, DOI arXiv:1912.08776
[7] Bordelon B, 2021, Arxiv, DOI arXiv:2002.02561
[8] A PHASE SHIFT DEEP NEURAL NETWORK FOR HIGH FREQUENCY APPROXIMATION AND WAVE PROBLEMS
Cai, Wei
Li, Xiaoguang
Liu, Lizuo
[J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2020, 42 (05) : A3285 - A3312
[9] Cao Y, 2020, Arxiv, DOI [arXiv:1912.01198, 10.48550/arXiv.1912.01198]
[10] SMOOTHING NOISY DATA WITH SPLINE FUNCTIONS
WAHBA, G
[J]. NUMERISCHE MATHEMATIK, 1975, 24 (05) : 383 - 393

← 1 2 3 →