On the Exact Computation of Linear Frequency Principle Dynamics and Its Generalization

被引:2
作者
Luo, Tao [1 ,2 ]
Ma, Zheng [1 ,2 ]
Xu, Zhi-Qin John [2 ,3 ]
Zhang, Yaoyu [2 ,3 ,4 ]
机构
[1] Shanghai Jiao Tong Univ, MOE LSC, Inst Nat Sci, Sch Math Sci,CMA Shanghai, Shanghai 200240, Peoples R China
[2] Shanghai Jiao Tong Univ, Qing Yuan Res Inst, Shanghai 200240, Peoples R China
[3] Shanghai Jiao Tong Univ, MOE LSC, Inst Nat Sci, Sch Math Sci, Shanghai 200240, Peoples R China
[4] Shanghai Ctr Brain Sci & Brain Inspired Technol, Shanghai 200240, Peoples R China
来源
SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE | 2022年 / 4卷 / 04期
基金
国家重点研发计划; 中国国家自然科学基金; 上海市自然科学基金;
关键词
deep learning; frequency principle; Fourier analysis; two-layer neural network; neural tangent kernel; optimization; DEEP NEURAL-NETWORK; GENERALIZATION ERROR;
D O I
10.1137/21M1444400
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Recent works show the intriguing phenomenon of the frequency principle (F-Principle) that deep neural networks (DNNs) fit the target function from low to high frequency during training, which provides insight into the training and generalization behavior of DNNs in complex tasks. In this paper, through analysis of an infinite-width two-layer NN in the neural tangent kernel regime, we derive the exact differential equation, namely the linear frequency-principle (LFP) model, governing the evolution of NN output function in the frequency domain during training. Our exact computation applies for general activation functions with no assumption on size and distribution of training data. This LFP model unravels that higher frequencies evolve polynomially or exponentially slower than lower frequencies depending on the smoothness/regularity of the activation function. We further bridge the gap between training dynamics and generalization by proving that the LFP model implicitly minimizes a frequency-principle norm (FP-norm) of the learned function, by which higher frequencies are more severely penalized depending on the inverse of their evolution rate. Finally, we derive an a priori generalization error bound controlled by the FP-norm of the target function, which provides a theoretical justification for the empirical results that DNNs often generalize well for low-frequency functions.
引用
收藏
页码:1272 / 1292
页数:21
相关论文
共 30 条
  • [1] [Anonymous], 2019, ADV NEUR IN
  • [2] Arora S, 2019, Arxiv, DOI arXiv:1901.08584
  • [3] Arpit D, 2017, PR MACH LEARN RES, V70
  • [4] Basri R, 2019, ADV NEUR IN, V32
  • [5] Basri R, 2020, Arxiv, DOI arXiv:2003.04560
  • [6] Biland S, 2019, Arxiv, DOI arXiv:1912.08776
  • [7] Bordelon B, 2021, Arxiv, DOI arXiv:2002.02561
  • [8] A PHASE SHIFT DEEP NEURAL NETWORK FOR HIGH FREQUENCY APPROXIMATION AND WAVE PROBLEMS
    Cai, Wei
    Li, Xiaoguang
    Liu, Lizuo
    [J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2020, 42 (05) : A3285 - A3312
  • [9] Cao Y, 2020, Arxiv, DOI [arXiv:1912.01198, 10.48550/arXiv.1912.01198]
  • [10] SMOOTHING NOISY DATA WITH SPLINE FUNCTIONS
    WAHBA, G
    [J]. NUMERISCHE MATHEMATIK, 1975, 24 (05) : 383 - 393