Dynamic Control Flow in Large-Scale Machine Learning

被引:35
作者
Yu, Yuan [1 ,2 ]
Abadi, Martin [2 ]
Barham, Paul [2 ]
Brevdo, Eugene [2 ]
Burrows, Mike [2 ]
Davis, Andy [2 ]
Dean, Jeff [2 ]
Ghemawat, Sanjay [3 ]
Harley, Tim [4 ]
Hawkins, Peter [2 ]
Isard, Michael [2 ]
Kudlur, Manjunath [5 ]
Monga, Rajat [2 ]
Murray, Derek [2 ]
Zheng, Xiaoqiang [2 ]
机构
[1] Microsoft, Redmond, WA 98052 USA
[2] Google Brain, Mountain View, CA USA
[3] Google, Mountain View, CA USA
[4] DeepMind, London, England
[5] Cerebras Syst, Los Altos, CA USA
来源
EUROSYS '18: PROCEEDINGS OF THE THIRTEENTH EUROSYS CONFERENCE | 2018年
关键词
D O I
10.1145/3190508.3190551
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many recent machine learning models rely on fine-grained dynamic control flow for training and inference. In particular, models based on recurrent neural networks and on reinforcement learning depend on recurrence relations, data-dependent conditional execution, and other features that call for dynamic control flow. These applications benefit from the ability to make rapid control-flow decisions across a set of computing devices in a distributed system. For performance, scalability, and expressiveness, a machine learning system must support dynamic control flow in distributed and heterogeneous environments. This paper presents a programming model for distributed machine learning that supports dynamic control flow. We describe the design of the programming model, and its implementation in TensorFlow, a distributed machine learning system. Our approach extends the use of dataflow graphs to represent machine learning models, offering several distinctive features. First, the branches of conditionals and bodies of loops can be partitioned across many machines to run on a set of heterogeneous devices, including CPUs, GPUs, and custom ASICs. Second, programs written in our model support automatic differentiation and distributed gradient computations, which are necessary for training machine learning models that use control flow. Third, our choice of non-strict semantics enables multiple loop iterations to execute in parallel across machines, and to overlap compute and I/O operations. We have done our work in the context of TensorFlow, and it has been used extensively in research and production. We evaluate it using several real-world applications, and demonstrate its performance and scalability.
引用
收藏
页数:15
相关论文
共 38 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]  
[Anonymous], ABS160603401 CORR
[3]  
[Anonymous], 180201528 ARXIV
[4]  
[Anonymous], ABS160406174 CORR
[5]  
[Anonymous], ABS170202181 CORR
[6]  
[Anonymous], P 9 USENIX S NETW SY
[7]  
[Anonymous], P HOTOS
[8]  
[Anonymous], 2012, ABS12115590 CORR
[9]  
[Anonymous], 2017, EAGER EXECUTION IMPE
[10]  
[Anonymous], MXNET DEEP LEARN