On the Neural Tangent Kernel of Deep Networks with Orthogonal Initialization

被引:0
|
作者
Huang, Wei [1 ]
Du, Weitao [2 ]
Da Xu, Richard Yi [1 ]
机构
[1] Univ Technol Sydney, Sydney, Australia
[2] Northwestern Univ, Evanston, IL USA
来源
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021 | 2021年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The prevailing thinking is that orthogonal weights are crucial to enforcing dynamical isometry and speeding up training. The increase in learning speed that results from orthogonal initialization in linear networks has been well-proven. However, while the same is believed to also hold for non-linear networks when the dynamical isometry condition is satisfied, the training dynamics behind this contention have not been thoroughly explored. In this work, we study the dynamics of ultra-wide networks across a range of architectures, including Fully Connected Networks (FCNs) and Convolutional Neural Networks (CNNs) with orthogonal initialization via neural tangent kernel (NTK). Through a series of propositions and lemmas, we prove that two NTKs, one corresponding to Gaussian weights and one to orthogonal weights, are equal when the network width is infinite. Further, during training, the NTK of an orthogonally-initialized infinite-width network should theoretically remain constant. This suggests that the orthogonal initialization cannot speed up training in the NTK (lazy training) regime, contrary to the prevailing thoughts. In order to explore under what circumstances can orthogonality accelerate training, we conduct a thorough empirical investigation outside the NTK regime. We find that when the hyper-parameters are set to achieve a linear regime in nonlinear activation, orthogonal initialization can improve the learning speed with a large learning rate or large depth.
引用
收藏
页码:2577 / 2583
页数:7
相关论文
共 50 条
  • [1] Neural Tangent Kernel Analysis of Deep Narrow Neural Networks
    Lee, Jongmin
    Choi, Joo Young
    Ryu, Ernest K.
    No, Albert
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [2] Neural Tangent Kernel at Initialization: Linear Width Suffices
    Banerjee, Arindam
    Cisneros-Velarde, Pedro
    Zhu, Libin
    Belkin, Mikhail
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 110 - 118
  • [3] Sparsity-Aware Orthogonal Initialization of Deep Neural Networks
    Esguerra, Kiara
    Nasir, Muneeb
    Tang, Tong Boon
    Tumian, Afidalina
    Ho, Eric Tatt Wei
    IEEE ACCESS, 2023, 11 : 74165 - 74181
  • [4] Spectral Analysis of the Neural Tangent Kernel for Deep Residual Networks
    Belfer, Yuval
    Geifman, Amnon
    Galun, Meirav
    Basri, Ronen
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 49
  • [5] Neural Tangent Kernel: Convergence and Generalization in Neural Networks
    Jacot, Arthur
    Gabriel, Franck
    Hongler, Clement
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [6] Tight Bounds on the Smallest Eigenvalue of the Neural Tangent Kernel for Deep ReLU Networks
    Nguyen, Quynh
    Mondelli, Marco
    Montufar, Guido
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [7] "Lossless" Compression of Deep Neural Networks: A High-dimensional Neural Tangent Kernel Approach
    Gu, Lingyu
    Du, Yongqi
    Zhang, Yuan
    Xie, Di
    Pu, Shiliang
    Qiu, Robert C.
    Liao, Zhenyu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [8] On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks
    Yang, Hongru
    Wang, Zhangyang
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
  • [9] Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? - A Neural Tangent Kernel Perspective
    Huang, Kaixuan
    Wang, Yuqing
    Tao, Molei
    Zhao, Tuo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [10] Spectra of the Conjugate Kernel and Neural Tangent Kernel for Linear-Width Neural Networks
    Fan, Zhou
    Wang, Zhichao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33