Scheduling Multi-Server Jobs With Sublinear Regrets via Online Learning

被引:0
作者
Zhao, Hailiang [1 ,2 ]
Deng, Shuiguang [1 ,2 ]
Xiang, Zhengzhe [3 ]
Yan, Xueqiang [4 ]
Yin, Jianwei [2 ]
Dustdar, Schahram [5 ]
Zomaya, Albert Y. [6 ]
机构
[1] Zhejiang Univ, Hainan Inst, Sanya 572025, Peoples R China
[2] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Peoples R China
[3] Hangzhou City Univ, Sch Comp & Comp Sci, Hangzhou 310015, Peoples R China
[4] Huawei Technol Co Ltd, Shanghai 201206, Peoples R China
[5] Tech Univ Wien, Distributed Syst Grp, A1040 Vienna, Austria
[6] Univ Sydney, Sch Comp Sci, Sydney, NSW 2006, Australia
关键词
Computational modeling; Processor scheduling; Training; Electronic mail; Computer science; Bipartite graph; Analytical models; Multi-server job; online gradient ascent; online scheduling; regret analysis;
D O I
10.1109/TSC.2023.3303344
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-server jobs that request multiple computing resources and hold onto them during their execution dominate modern computing clusters. When allocating the multi-type resources to several co-located multi-server jobs simultaneously in online settings, it is difficult to make the tradeoff between the parallel computation gain and the internal communication overhead, apart from the resource contention between jobs. To study the computation-communication tradeoff, we model the computation gain as the speedup on the job completion time when it is executed in parallelism on multiple computing instances, and fit it with utilities of different concavities. Meanwhile, we take the dominant communication overhead as the penalty to be subtracted. To achieve a better gain-overhead tradeoff, we formulate an cumulative reward maximization program and design an online algorithm, named OgaSched, to schedule multi-server jobs. OgaSched allocates the multi-type resources to each arrived job in the ascending direction of the reward gradients. It has several parallel sub-procedures to accelerate its computation, which greatly reduces the complexity. We proved that it has a sublinear regret with general concave rewards. We also conduct extensive trace-driven simulations to validate the performance of OgaSched. The results demonstrate that OgaSched outperforms widely used heuristics by 11.33%, 7.75%, 13.89%, and 13.44%, respectively.
引用
收藏
页码:1168 / 1180
页数:13
相关论文
共 46 条
[1]  
Anderson T, 2008, THEORY AND PRACTICE OF ONLINE LEARNING, 2ND EDITION, P1
[2]   Federated learning in cloud-edge collaborative architecture: key technologies, applications and challenges [J].
Bao, Guanming ;
Guo, Ping .
JOURNAL OF CLOUD COMPUTING-ADVANCES SYSTEMS AND APPLICATIONS, 2022, 11 (01)
[3]  
Bao YX, 2018, IEEE INFOCOM SER, P495, DOI 10.1109/INFOCOM.2018.8486422
[4]   Fundamental Limits on the Regret of Online Network-Caching [J].
Bhattacharjee, Rajarshi ;
Banerjee, Subhankar ;
Sinha, Abhishek .
PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2020, 4 (02)
[5]  
Burra R, 2019, IEEE INFOCOM SER, P2584, DOI [10.1109/infocom.2019.8737370, 10.1109/INFOCOM.2019.8737370]
[6]   Kubernetes Scheduling: Taxonomy, Ongoing Issues and Challenges [J].
Carrion, Carmen .
ACM COMPUTING SURVEYS, 2023, 55 (07)
[7]   Deadline-Aware MapReduce Job Scheduling with Dynamic Resource Availability [J].
Cheng, Dazhao ;
Zhou, Xiaobo ;
Xu, Yinggen ;
Liu, Liu ;
Jiang, Changjun .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (04) :814-826
[8]  
Ghodsi An, 2011, Computer Communication Review, V41, P507, DOI 10.1145/2018584.2018586
[9]  
github, 2022, Alibaba cluster trace
[10]   Tailored Learning-Based Scheduling for Kubernetes-Oriented Edge-Cloud System [J].
Han, Yiwen ;
Shen, Shihao ;
Wang, Xiaofei ;
Wang, Shiqiang ;
Leung, Victor C. M. .
IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2021), 2021,