On Model Parallelization and Scheduling Strategies for Distributed Machine Learning

被引:0
作者
Lee, Seunghak [1 ]
Kim, Jin Kyu [1 ]
Zheng, Xun [1 ]
Ho, Qirong [2 ]
Gibson, Garth A. [1 ]
Xing, Eric P. [1 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
[2] ASTAR, Inst Infocomm Res, Singapore 138632, Singapore
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014) | 2014年 / 27卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Distributed machine learning has typically been approached from a data parallel perspective, where big data are partitioned to multiple workers and an algorithm is executed concurrently over different data subsets under various synchronization schemes to ensure speed-up and/or correctness. A sibling problem that has received relatively less attention is how to ensure efficient and correct model parallel execution of ML algorithms, where parameters of an ML program are partitioned to different workers and undergone concurrent iterative updates. We argue that model and data parallelisms impose rather different challenges for system design, algorithmic adjustment, and theoretical analysis. In this paper, we develop a system for model-parallelism, STRADS, that provides a programming abstraction for scheduling parameter updates by discovering and leveraging changing structural properties of ML programs. STRADS enables a flexible tradeoff between scheduling efficiency and fidelity to intrinsic dependencies within the models, and improves memory efficiency of distributed ML. We demonstrate the efficacy of model-parallel algorithms implemented on STRADS versus popular implementations for topic modeling, matrix factorization, and Lasso.
引用
收藏
页数:9
相关论文
共 29 条
  • [1] Ahmed Amr., 2012, Proceedings of the fifth ACM international conference on Web search and data mining, WSDM '12, P333
  • [2] [Anonymous], 2010, Advances in neural information processing systems
  • [3] [Anonymous], SIGKDD
  • [4] Bennett J., 2007, P KDD CUP WORKSH NEW
  • [5] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [6] BRADLEY JK, 2011, P 28 INT C MACH LEAR, P321
  • [7] Dai Wei., 2014, AAAI
  • [8] Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
  • [9] Dean J., 2012, Advances in Neural Information Processing Systems, P1223, DOI DOI 10.5555/2999134.2999271
  • [10] Fan JQ, 2009, J MACH LEARN RES, V10, P2013