Optimal memory-aware backpropagation of deep join networks

被引:15
作者
Beaumont, Olivier [1 ]
Herrmann, Julien [1 ]
Pallez , Guillaume [1 ]
Shilova, Alena [1 ]
机构
[1] Univ Bordeaux, INRIA, Labri, Talence, France
来源
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES | 2020年 / 378卷 / 2166期
关键词
backpropagation; memory; pebble game; ALGORITHM;
D O I
10.1098/rsta.2019.0049
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Deep learning training memory needs can prevent the user from considering large models and large batch sizes. In this work, we propose to use techniques from memory-aware scheduling and automatic differentiation (AD) to execute a backpropagation graph with a bounded memory requirement at the cost of extra recomputations. The case of a single homogeneous chain, i.e. the case of a network whose stages are all identical and form a chain, is well understood and optimal solutions have been proposed in the AD literature. The networks encountered in practice in the context of deep learning are much more diverse, both in terms of shape and heterogeneity. In this work, we define the class of backpropagation graphs, and extend those on which one can compute in polynomial time a solution that minimizes the total number of recomputations. In particular, we consider join graphs which correspond to models such as siamese or cross-modal networks. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'.
引用
收藏
页数:14
相关论文
共 22 条
[1]  
[Anonymous], Automatic Differentiation in PyTorch
[2]  
[Anonymous], 2016, TRAINING DEEP NETS S
[3]  
[Anonymous], 2014, REGISTRATION RECOGNI
[4]  
[Anonymous], 2010, JMLR WORKSH C P
[5]  
Aupy G, 2019, H REVOLVE FRAMEWORK
[6]   OPTIMAL MULTISTAGE ALGORITHM FOR ADJOINT COMPUTATION [J].
Aupy, Guillaume ;
Herrmann, Julien ;
Hovland, Paul ;
Robert, Yves .
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2016, 38 (03) :C232-C255
[7]  
Beaumont O, 2019, RR9273 INRIA
[8]  
Bromley J., 1993, International Journal of Pattern Recognition and Artificial Intelligence, V7, P669, DOI 10.1142/S0218001493000339
[9]  
Das D., 2016, Distributed deep learning using synchronous stochastic gradient descent
[10]  
Dean Jeffrey., 2012, Advances in neural information processing systems, V25, P1223