On the Neural Tangent Kernel of Deep Networks with Orthogonal Initialization

被引：0

作者：

Huang, Wei ^{[1
]}

Du, Weitao ^{[2
]}

Da Xu, Richard Yi ^{[1
]}

机构：

[1] Univ Technol Sydney, Sydney, Australia

[2] Northwestern Univ, Evanston, IL USA

来源：

PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021 | 2021年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The prevailing thinking is that orthogonal weights are crucial to enforcing dynamical isometry and speeding up training. The increase in learning speed that results from orthogonal initialization in linear networks has been well-proven. However, while the same is believed to also hold for non-linear networks when the dynamical isometry condition is satisfied, the training dynamics behind this contention have not been thoroughly explored. In this work, we study the dynamics of ultra-wide networks across a range of architectures, including Fully Connected Networks (FCNs) and Convolutional Neural Networks (CNNs) with orthogonal initialization via neural tangent kernel (NTK). Through a series of propositions and lemmas, we prove that two NTKs, one corresponding to Gaussian weights and one to orthogonal weights, are equal when the network width is infinite. Further, during training, the NTK of an orthogonally-initialized infinite-width network should theoretically remain constant. This suggests that the orthogonal initialization cannot speed up training in the NTK (lazy training) regime, contrary to the prevailing thoughts. In order to explore under what circumstances can orthogonality accelerate training, we conduct a thorough empirical investigation outside the NTK regime. We find that when the hyper-parameters are set to achieve a linear regime in nonlinear activation, orthogonal initialization can improve the learning speed with a large learning rate or large depth.

引用

页码：2577 / 2583

页数：7

共 50 条

[1] Neural Tangent Kernel Analysis of Deep Narrow Neural Networks
Lee, Jongmin
Choi, Joo Young
Ryu, Ernest K.
No, Albert
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[2] Neural Tangent Kernel at Initialization: Linear Width Suffices
Banerjee, Arindam
Cisneros-Velarde, Pedro
Zhu, Libin
Belkin, Mikhail
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 110 - 118
[3] Sparsity-Aware Orthogonal Initialization of Deep Neural Networks
Esguerra, Kiara
Nasir, Muneeb
Tang, Tong Boon
Tumian, Afidalina
Ho, Eric Tatt Wei
IEEE ACCESS, 2023, 11 : 74165 - 74181
[4] Spectral Analysis of the Neural Tangent Kernel for Deep Residual Networks
Belfer, Yuval
Geifman, Amnon
Galun, Meirav
Basri, Ronen
JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 49
[5] Neural Tangent Kernel: Convergence and Generalization in Neural Networks
Jacot, Arthur
Gabriel, Franck
Hongler, Clement
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[6] Tight Bounds on the Smallest Eigenvalue of the Neural Tangent Kernel for Deep ReLU Networks
Nguyen, Quynh
Mondelli, Marco
Montufar, Guido
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[7] "Lossless" Compression of Deep Neural Networks: A High-dimensional Neural Tangent Kernel Approach
Gu, Lingyu
Du, Yongqi
Zhang, Yuan
Xie, Di
Pu, Shiliang
Qiu, Robert C.
Liao, Zhenyu
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[8] On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks
Yang, Hongru
Wang, Zhangyang
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
[9] Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? - A Neural Tangent Kernel Perspective
Huang, Kaixuan
Wang, Yuqing
Tao, Molei
Zhao, Tuo
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[10] Spectra of the Conjugate Kernel and Neural Tangent Kernel for Linear-Width Neural Networks
Fan, Zhou
Wang, Zhichao
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33

← 1 2 3 4 5 →