On the Neural Tangent Kernel of Deep Networks with Orthogonal Initialization

被引:0
作者
Huang, Wei [1 ]
Du, Weitao [2 ]
Da Xu, Richard Yi [1 ]
机构
[1] Univ Technol Sydney, Sydney, Australia
[2] Northwestern Univ, Evanston, IL USA
来源
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021 | 2021年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The prevailing thinking is that orthogonal weights are crucial to enforcing dynamical isometry and speeding up training. The increase in learning speed that results from orthogonal initialization in linear networks has been well-proven. However, while the same is believed to also hold for non-linear networks when the dynamical isometry condition is satisfied, the training dynamics behind this contention have not been thoroughly explored. In this work, we study the dynamics of ultra-wide networks across a range of architectures, including Fully Connected Networks (FCNs) and Convolutional Neural Networks (CNNs) with orthogonal initialization via neural tangent kernel (NTK). Through a series of propositions and lemmas, we prove that two NTKs, one corresponding to Gaussian weights and one to orthogonal weights, are equal when the network width is infinite. Further, during training, the NTK of an orthogonally-initialized infinite-width network should theoretically remain constant. This suggests that the orthogonal initialization cannot speed up training in the NTK (lazy training) regime, contrary to the prevailing thoughts. In order to explore under what circumstances can orthogonality accelerate training, we conduct a thorough empirical investigation outside the NTK regime. We find that when the hyper-parameters are set to achieve a linear regime in nonlinear activation, orthogonal initialization can improve the learning speed with a large learning rate or large depth.
引用
收藏
页码:2577 / 2583
页数:7
相关论文
共 50 条
[21]   Neural (Tangent Kernel) Collapse [J].
Seleznova, Mariia ;
Weitzner, Dana ;
Giryes, Raja ;
Kutyniok, Gitta ;
Chou, Hung-Hsu .
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[22]   On the Random Conjugate Kernel and Neural Tangent Kernel [J].
Hu, Zhengmian ;
Huang, Heng .
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[23]   Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel [J].
Kobayashi, Seijin ;
Aceituno, Pau Vilimelis ;
von Oswald, Johannes .
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[24]   Deep learning in random neural fields: Numerical experiments via neural tangent kernel [J].
Watanabe, Kaito ;
Sakamoto, Kotaro ;
Karakida, Ryo ;
Sonoda, Sho ;
Amari, Shun-ichi .
NEURAL NETWORKS, 2023, 160 :148-163
[25]   GRAPH CONVOLUTIONAL NETWORKS FROM THE PERSPECTIVE OF SHEAVES AND THE NEURAL TANGENT KERNEL [J].
Gebhart, Thomas .
TOPOLOGICAL, ALGEBRAIC AND GEOMETRIC LEARNING WORKSHOPS 2022, VOL 196, 2022, 196
[26]   Stability & Generalisation of Gradient Descent for Shallow Neural Networks without the Neural Tangent Kernel [J].
Richards, Dominic ;
Kuzborskij, Ilja .
Advances in Neural Information Processing Systems, 2021, 11 :8609-8621
[27]   Stability & Generalisation of Gradient Descent for Shallow Neural Networks without the Neural Tangent Kernel [J].
Richards, Dominic ;
Kuzborskij, Ilja .
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
[28]   Assisting Training of Deep Spiking Neural Networks With Parameter Initialization [J].
Ding, Jianhao ;
Zhang, Jiyuan ;
Huang, Tiejun ;
Liu, Jian K. ;
Yu, Zhaofei .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (08) :15015-15028
[29]   DAVINZ: Data Valuation using Deep Neural Networks at Initialization [J].
Wu, Zhaoxuan ;
Shu, Yao ;
Low, Bryan Kian Hsiang .
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[30]   A type of generalization error induced by initialization in deep neural networks [J].
Zhang, Yaoyu ;
Xu, Zhi-Qin John ;
Luo, Tao ;
Ma, Zheng .
MATHEMATICAL AND SCIENTIFIC MACHINE LEARNING, VOL 107, 2020, 107 :144-164