Swift Machine Learning Model Serving Scheduling: A Region Based Reinforcement Learning Approach

被引:19
作者
Qin, Heyang [1 ]
Zawad, Syed [1 ]
Zhou, Yanqi [2 ]
Yang, Lei [1 ]
Zhao, Dongfang [1 ]
Yan, Feng [1 ]
机构
[1] Univ Nevada, Reno, NV 89557 USA
[2] Google Brain, Mountain View, CA USA
来源
PROCEEDINGS OF SC19: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS | 2019年
基金
美国国家科学基金会;
关键词
Model Inference; machine-learning-as-a-service (MLaaS); parallelism parameter tuning; reinforcement learning; workload scheduling; service-level-objective (SLO); NEURAL-NETWORKS;
D O I
10.1145/3295500.3356164
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The success of machine learning has prospered Machine-Learning-as-a-Service (MLaaS) - deploying trained machine learning (ML) models in cloud to provide low latency inference services at scale. To meet latency Service-Level-Objective (SLO), judicious parallelization at both request and operation levels is utterly important. However, existing ML systems (e.g., Tensorflow) and cloud ML serving platforms (e.g., SageMaker) are SLO-agnostic and rely on users to manually configure the parallelism. To provide low latency ML serving, this paper proposes a swift machine learning serving scheduling framework with a novel Region-based Reinforcement Learning (RRL) approach. RRL can efficiently identify the optimal parallelism configuration under different workloads by estimating performance of similar configurations with that of the known ones. We both theoretically and experimentally show that the RRL approach can outperform state-of-the-art approaches by finding near optimal solutions over 8 times faster while reducing inference latency up to 79.0% and reducing SLO violation up to 49.9%.
引用
收藏
页数:23
相关论文
共 78 条
[1]  
Abernethy Jacob D, 2009, COMPETING DARK EFFIC
[2]  
Alipourfard Omid, 2017, NSDI, V2, P4
[3]  
Amodei D, 2016, PR MACH LEARN RES, V48
[4]  
Andreas Jacob, 2016, ABS160101705 CORR
[5]  
[Anonymous], 2019, INTEL R MATH KERNEL
[6]  
[Anonymous], 2019, Tensorboard: Tensorflow's visualization toolkit
[7]  
[Anonymous], 2019, TENSORFLOW XLA
[8]  
[Anonymous], 2014, ASPLOS
[9]  
[Anonymous], 2016, ABS160304467 CORR
[10]  
[Anonymous], 1988, SIMULATED ANNEALING