Linear RNNs Provably Learn Linear Dynamical Systems

被引:0
作者
Wang, Lifu [1 ]
Wang, Tianyu [1 ]
Yi, Shengwei [1 ]
Shen, Bo [2 ]
Hu, Bo [2 ]
Cao, Xing [2 ]
机构
[1] China Informat Technol Secur Evaluat Ctr CNITSEC, Beijing, Peoples R China
[2] Beijing Jiaotong Univ, Sch Elect & Informat Engn, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Non-convex optimization; Machine learning; Linear systems; Recurrent neural networks;
D O I
10.1007/s10957-024-02521-3
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
In this paper, we investigate the learning abilities of linear recurrent neural networks (RNNs) trained using Gradient Descent. We present a theoretical guarantee demonstrating that these linear RNNs can effectively learn any stable linear dynamical system with polynomial complexity. Importantly, our derived generalization error bound is independent of the episode length. For any stable linear system with a transition matrix C characterized by a parameter rho C\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho _C$$\end{document} related to the spectral radius, we prove that despite the non-convexity of the parameter optimization loss, a linear RNN can learn the system with polynomial sample and time complexity in 11-rho C\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{1}{1-\rho _C}$$\end{document}, provided that the RNN has sufficient width. Notably, the required width of the hidden layers does not depend on the length of the input sequence. This work provides the first rigorous theoretical foundation for learning linear RNNs. Our findings suggest that linear RNNs are capable of efficiently learning complex dynamical systems, paving the way for further research into the learning capabilities of more general RNN architectures.
引用
收藏
页码:488 / 528
页数:41
相关论文
共 29 条
[1]  
Allen-Zhu Z, 2019, PR MACH LEARN RES, V97
[2]  
Allen-Zhu Z, 2019, ADV NEUR IN, V32
[3]  
Belanger D, 2015, PR MACH LEARN RES, V37, P833
[4]   Finite sample properties of system identification methods [J].
Campi, MC ;
Weyer, E .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2002, 47 (08) :1329-1334
[5]  
Cao Yuan, 2019, Advances in Neural Information Processing Systems, V32
[6]  
Chen TS, 2015, 2015 EUROPEAN CONTROL CONFERENCE (ECC), P1291, DOI 10.1109/ECC.2015.7330716
[7]  
Dowler D.A., 2013, THESIS B YOUNG U
[8]  
Du S. S., 2018, P INT C LEARN REPR, P1
[9]  
Du Simon, 2019, PR MACH LEARN RES, P1675, DOI DOI 10.48550/ARXIV.1811.03804
[10]   Applications of Kalman Filtering in Aerospace 1960 to the Present [J].
Grewal, Mohinder S. ;
Andrews, Angus P. .
IEEE CONTROL SYSTEMS MAGAZINE, 2010, 30 (03) :69-78