Convergence Analysis for Learning Orthonormal Deep Linear Neural Networks

被引:0
作者
Qin, Zhen [1 ]
Tan, Xuwei [1 ]
Zhu, Zhihui [1 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
关键词
Convergence analysis; deep neural networks; orthonormal structure; Riemannian optimization; OPTIMIZATION;
D O I
10.1109/LSP.2024.3374085
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Enforcing orthonormal or isometric property for the weight matrices has been shown to enhance the training of deep neural networks by mitigating gradient exploding/vanishing and increasing the robustness of the learned networks. However, despite its practical performance, the theoretical analysis of orthonormality in neural networks is still lacking; for example, how orthonormality affects the convergence of the training process. In this letter, we aim to bridge this gap by providing convergence analysis for training orthonormal deep linear neural networks. Specifically, we show that Riemannian gradient descent with an appropriate initialization converges at a linear rate for training orthonormal deep linear neural networks with a class of loss functions. Unlike existing works that enforce orthonormal weight matrices for all the layers, our approach excludes this requirement for one layer, which is crucial to establish the convergence guarantee. Our results shed light on how increasing the number of hidden layers can impact the convergence speed. Experimental results validate our theoretical analysis.
引用
收藏
页码:795 / 799
页数:5
相关论文
共 35 条
[1]  
Achour E, 2022, J MACH LEARN RES, V23
[2]  
Allen-Zhu Z., 2019, P ADV NEUR INF PROC, P6676
[3]  
Arjovsky M, 2016, PR MACH LEARN RES, V48
[4]  
Arora S, 2019, Arxiv, DOI [arXiv:1810.02281, DOI 10.48550/ARXIV.1810.02281]
[5]  
Bansal N., 2018, Advances in Neural Information Processing Systems, P4261
[6]  
Bartlett Peter, 2018, INT C MACHINE LEARNI, P521
[7]   LEARNING LONG-TERM DEPENDENCIES WITH GRADIENT DESCENT IS DIFFICULT [J].
BENGIO, Y ;
SIMARD, P ;
FRASCONI, P .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (02) :157-166
[8]  
Chatterjee S, 2022, Arxiv, DOI arXiv:2203.16462
[9]  
Cisse M, 2017, PR MACH LEARN RES, V70
[10]  
Cogswell M, 2016, Arxiv, DOI arXiv:1511.06068