FlexPS: Flexible Parallelism Control in Parameter Server Architecture

被引:39
作者
Huang, Yuzhen [1 ]
Jin, Tatiana [1 ]
Wu, Yidi [1 ]
Cai, Zhenkun [1 ]
Yan, Xiao [1 ]
Yang, Fan [1 ]
Li, Jinfeng [1 ]
Guo, Yuying [1 ]
Cheng, James [1 ]
机构
[1] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Hong Kong, Hong Kong, Peoples R China
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2018年 / 11卷 / 05期
关键词
OPTIMIZATION; FRAMEWORK; EFFICIENT; ALGORITHM;
D O I
10.1145/3177732.3177734
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As a general abstraction for coordinating the distributed storage and access of model parameters, the parameter server (PS) architecture enables distributed machine learning to handle large datasets and high dimensional models. Many systems, such as Parameter Server and Petuum, have been developed based on the PS architecture and widely used in practice. However, none of these systems supports changing parallelism during runtime, which is crucial for the efficient execution of machine learning tasks with dynamic workloads. We propose a new system, called FlexPS, which introduces a novel multi-stage abstraction to support flexible parallelism control. With the multi-stage abstraction, a machine learning task can be mapped to a series of stages and the parallelism for a stage can be set according to its workload. Optimizations such as stage scheduler, stage aware consistency controller, and direct model transfer are proposed for the efficiency of multi-stage machine learning in FlexPS. As a general and complete PS systems, FlexPS also incorporates many optimizations that are not limited to multi-stage machine learning. We conduct extensive experiments using a variety of machine learning workloads, showing that FlexPS achieves significant speedups and resource saving compared with the state-of-the-art PS systems such as Petuum and Multiverso.
引用
收藏
页码:566 / 579
页数:14
相关论文
共 47 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]  
Allen-Zhu ZY, 2016, PR MACH LEARN RES, V48
[3]   Katyusha: The First Direct Acceleration of Stochastic Gradient Methods [J].
Allen-Zhu, Zeyuan .
STOC'17: PROCEEDINGS OF THE 49TH ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING, 2017, :1200-1205
[4]  
[Anonymous], 2016, CoRR
[5]  
[Anonymous], 2013, Adv. Neural Inf. Process. Syst.
[6]  
[Anonymous], 2007, P KDD CUP WORKSHOP
[7]  
[Anonymous], EUROSYS
[8]  
[Anonymous], 2015, MXNET FLEXIBLE EFFIC
[9]  
[Anonymous], 2011, Advances in Neural Information Processing Systems
[10]  
[Anonymous], 2014, OPERATING SYSTEMS DE