Derivation and analysis of parallel-in-time neural ordinary differential equations

被引：4

作者：

Lorin, E. ^{[1
,2
]}

机构：

[1] Univ Montreal, Ctr Rech Math, Montreal, PQ H3T 1J4, Canada

[2] Carleton Univ, Sch Math & Stat, Ottawa, ON K1S 5B6, Canada

来源：

ANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE | 2020年 / 88卷 / 10期

关键词：

Residual Neural Network; Neural Ordinary Differential Equations; Parareal method; Parallelism-in-time; PARAREAL;

D O I：

10.1007/s10472-020-09702-6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The introduction in 2015 of Residual Neural Networks (RNN) and ResNET allowed for outstanding improvements of the performance of learning algorithms for evolution problems containing a "large" number of layers. Continuous-depth RNN-like models called Neural Ordinary Differential Equations (NODE) were then introduced in 2019. The latter have a constant memory cost, and avoid thea priorispecification of the number of hidden layers. In this paper, we derive and analyze a parallel (-in-parameter and time) version of the NODE, which potentially allows for a more efficient implementation than a standard/naive parallelization of NODEs with respect to the parameters only. We expect this approach to be relevant whenever we have access to a very large number of processors, or when we are dealing with high dimensional ODE systems. Moreover, when using implicit ODE solvers, solutions to linear systems with up to cubic complexity are then required for solving nonlinear systems using for instance Newton's algorithm; as the proposed approach allows to reduce the overall number of time-steps thanks to an iterative increase of the accuracy order of the ODE system solvers, it then reduces the number of linear systems to solve, hence benefiting from a scaling effect.

引用

页码：1035 / 1059

页数：25

共 16 条

[1]

Anthony M., 1999, Neural network learning: Theoretical foundations

[2] Gradient convergence in gradient methods with errors [J].

Bertsekas, DP ;

Tsitsiklis, JN .

SIAM JOURNAL ON OPTIMIZATION, 2000, 10 (03) :627-642

[3]

Chen R.T., 2019, ARXIV180607366

[4] PARALLEL TIME INTEGRATION WITH MULTIGRID [J].

Falgout, R. D. ;

Friedhoff, S. ;

Kolev, Tz. V. ;

Maclachlan, S. P. ;

Schroder, J. B. .

SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2014, 36 (06) :C635-C661

[5]

Fischer PF, 2005, LECT NOTES COMP SCI, V40, P433

[6] Analysis of the parareal time-parallel time-integration method [J].

Gander, Martin J. ;

Vandewalle, Stefan .

SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2007, 29 (02) :556-578

[7] Adjoint methods for shape optimization [J].

Giannakoglou, Kyriakos C. ;

Papadimitriou, Dimitrios I. .

OPTIMIZATION AND COMPUTATIONAL FLUID DYNAMICS, 2008, :79-108

[8] Layer-Parallel Training of Deep Residual Neural Networks [J].

Guenther, Stefanie ;

Ruthotto, Lars ;

Schroder, Jacob B. ;

Cyr, Eric C. ;

Gauger, Nicolas R. .

SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2020, 2 (01) :1-23

[9] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[10]

Li, 2013, LECT NOTES COMPUT SC, V91, P451, DOI DOI 10.1007/978-3-642-35275-1_53

← 1 2 →