Exact learning dynamics of deep linear networks with prior knowledge

被引:1
作者
Domine, Clementine C. [1 ]
Braun, Lukas [2 ]
Fitzgerald, James E. [3 ]
Saxe, Andrew M. [1 ,4 ,5 ]
机构
[1] UCL, Gatsby Computat Neurosci Unit, London WC1N 3AR, England
[2] Univ Oxford, Dept Expt Psychol, Oxford, England
[3] Janelia Res Campus, Howard Hughes Med Inst, Virginia, VA USA
[4] UCL, Sainsbury Wellcome Ctr, London, England
[5] CIFAR, Toronto, ON, Canada
来源
JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT | 2023年 / 2023卷 / 11期
基金
英国惠康基金; 英国医学研究理事会;
关键词
deep learning; learning theory; machine learning; CONNECTIONIST MODELS; NEURAL-NETWORKS; SYSTEMS;
D O I
10.1088/1742-5468/ad01b8
中图分类号
O3 [力学];
学科分类号
08 ; 0801 ;
摘要
Learning in deep neural networks is known to depend critically on the knowledge embedded in the initial network weights. However, few theoretical results have precisely linked prior knowledge to learning dynamics. Here we derive exact solutions to the dynamics of learning with rich prior knowledge in deep linear networks by generalising Fukumizu's matrix Riccati solution (Fukumizu 1998 Gen 1 1E-03). We obtain explicit expressions for the evolving network function, hidden representational similarity, and neural tangent kernel over training for a broad class of initialisations and tasks. The expressions reveal a class of task-independent initialisations that radically alter learning dynamics from slow non-linear dynamics to fast exponential trajectories while converging to a global optimum with identical representational similarity, dissociating learning trajectories from the structure of initial internal representations. We characterise how network weights dynamically align with task structure, rigorously justifying why previous solutions successfully described learning from small initial weights without incorporating their fine-scale structure. Finally, we discuss the implications of these findings for continual learning, reversal learning and learning of structured knowledge. Taken together, our results provide a mathematical toolkit for understanding the impact of prior knowledge on deep learning.
引用
收藏
页数:48
相关论文
共 60 条
  • [1] [Anonymous], 2017, Advances in neural information processing systems
  • [2] Arora R., 2020, Princeton University Theory of deep learning
  • [3] Arora S, 2019, 33 C NEURAL INFORM P, V32
  • [4] Arora S, 2019, Arxiv, DOI arXiv:1810.02281
  • [5] Arora S, 2018, PR MACH LEARN RES, V80
  • [6] Statistical Mechanical Analysis of Catastrophic Forgetting in Continual Learning with Teacher and Student Networks
    Asanuma, Haruka
    Takagi, Shiro
    Nagano, Yoshihiro
    Yoshida, Yuki
    Igarashi, Yasuhiko
    Okada, Masato
    [J]. JOURNAL OF THE PHYSICAL SOCIETY OF JAPAN, 2021, 90 (10)
  • [7] Atanasov Alexander, 2022, INT C LEARN REPR
  • [8] Statistical Mechanics of Deep Learning
    Bahri, Yasaman
    Kadmon, Jonathan
    Pennington, Jeffrey
    Schoenholz, Sam S.
    Sohl-Dickstein, Jascha
    Ganguli, Surya
    [J]. ANNUAL REVIEW OF CONDENSED MATTER PHYSICS, VOL 11, 2020, 2020, 11 : 501 - 528
  • [9] NEURAL NETWORKS AND PRINCIPAL COMPONENT ANALYSIS - LEARNING FROM EXAMPLES WITHOUT LOCAL MINIMA
    BALDI, P
    HORNIK, K
    [J]. NEURAL NETWORKS, 1989, 2 (01) : 53 - 58
  • [10] Bengio Y., 2009, P 26 ANN INT C MACH, P41, DOI DOI 10.1145/1553374.1553380