Convergence Analysis for Learning Orthonormal Deep Linear Neural Networks

被引：0

作者：

Qin, Zhen ^{[1
]}

Tan, Xuwei ^{[1
]}

Zhu, Zhihui ^{[1
]}

机构：

[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA

来源：

IEEE SIGNAL PROCESSING LETTERS | 2024年 / 31卷

关键词：

Convergence analysis; deep neural networks; orthonormal structure; Riemannian optimization; OPTIMIZATION;

D O I：

10.1109/LSP.2024.3374085

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Enforcing orthonormal or isometric property for the weight matrices has been shown to enhance the training of deep neural networks by mitigating gradient exploding/vanishing and increasing the robustness of the learned networks. However, despite its practical performance, the theoretical analysis of orthonormality in neural networks is still lacking; for example, how orthonormality affects the convergence of the training process. In this letter, we aim to bridge this gap by providing convergence analysis for training orthonormal deep linear neural networks. Specifically, we show that Riemannian gradient descent with an appropriate initialization converges at a linear rate for training orthonormal deep linear neural networks with a class of loss functions. Unlike existing works that enforce orthonormal weight matrices for all the layers, our approach excludes this requirement for one layer, which is crucial to establish the convergence guarantee. Our results shed light on how increasing the number of hidden layers can impact the convergence speed. Experimental results validate our theoretical analysis.

引用

页码：795 / 799

页数：5

共 35 条

[11]

Dorobantu V, 2016, Arxiv, DOI arXiv:1612.04035

[12]

Han RA, 2021, Arxiv, DOI arXiv:2002.11255

[13]

Hanin B, 2018, ADV NEUR IN, V31

[14]

Harandi M, 2016, Arxiv, DOI arXiv:1611.05927

[15] Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1026-1034

[16] Controllable Orthogonalization in Training DNNs [J].

Huang, Lei ;

Liu, Li ;

Zhu, Fan ;

Wan, Diwen ;

Yuan, Zehuan ;

Li, Bo ;

Shao, Ling .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :6428-6437

[17]

Huang L, 2018, AAAI CONF ARTIF INTE, P3271

[18]

Li F., 2020, INT C LEARNREPRESENT, P1

[19]

Li J., 2020, PROC INT C LEARNREPR, P1

[20] WEAKLY CONVEX OPTIMIZATION OVER STIEFEL MANIFOLD USING RIEMANNIAN SUBGRADIENT-TYPE METHODS [J].

Li, Xiao ;

Chen, Shixiang ;

Deng, Zengde ;

Qu, Qing ;

Zhu, Zhihui ;

So, Anthony Man-Cho .

SIAM JOURNAL ON OPTIMIZATION, 2021, 31 (03) :1605-1634

← 1 2 3 4 →