Scheduling Jobs across Geo-Distributed Datacenters with Max-Min Fairness

被引:34
作者
Chen, Li [1 ]
Liu, Shuhao [1 ]
Li, Baochun [1 ]
Li, Bo [2 ]
机构
[1] Univ Toronto, Dept Elect & Comp Engn, Toronto, ON M5S3G4, Canada
[2] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China
来源
IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING | 2019年 / 6卷 / 03期
基金
加拿大自然科学与工程研究理事会;
关键词
Geo-distributed datacenter networks; wide-area big data analytics; scheduling; fairness;
D O I
10.1109/TNSE.2018.2795580
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
It has become routine for large volumes of data to be generated, stored, and processed across geographically distributed datacenters. To run a single data analytic job on such geo-distributed data, recent research proposed to distribute its tasks across datacenters, considering both data locality and network bandwidth across datacenters. Yet, it remains an open problem in the more general case, where multiple analytic jobs need to fairly share the resources at these geo-distributed datacenters. In this paper, we focus on the problem of assigning tasks belonging to multiple jobs across datacenters, with the specific objective of achieving max-min fairness across jobs sharing these datacenters, in terms of their job completion times. We formulate this problem as a lexicographical minimization problem, which is challenging to solve in practice due to its inherent multi-objective and discrete nature. To address these challenges, we iteratively solve its single-objective subproblems, which can be transformed to equivalent linear programming (LP) problems to be efficiently solved, thanks to their favorable properties. As a highlight of this paper, we have designed and implemented our proposed solution as a fair job scheduler based on Apache Spark, a modern data processing framework. With extensive evaluations of our real-world implementation on Amazon EC2 and large-scale simulations, we have shown convincing evidence that max-min fairness has been achieved and the worst job completion time has been significantly improved using our new job scheduler.
引用
收藏
页码:488 / 500
页数:13
相关论文
共 17 条
[1]  
Andersen Erling D., 2000, High performance optimization, P197, DOI [DOI 10.1007/978-1-4757-3216-0_8, 10.1007/978-1- 4757- 3216-08]
[2]  
Chen LB, 2017, IEEE INT SYMP NANO, P1, DOI 10.1109/NANOARCH.2017.8053709
[3]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
[4]  
Hindman B., 2011, Mesos: a platform for fine-grained resource sharing in the data center, P295
[5]  
Hu Zhen., 2016, 2016 CLEMSON U POWER, P1
[6]   Scheduling Jobs Across Geo-distributed Datacenters [J].
Hung, Chien-Chun ;
Golubchik, Leana ;
Yu, Minlan .
ACM SOCC'15: PROCEEDINGS OF THE SIXTH ACM SYMPOSIUM ON CLOUD COMPUTING, 2015, :111-124
[7]  
Kloudas K, 2015, PROC VLDB ENDOW, V9, P72
[8]  
Korte B., 2006, Combinatorial Optimization: Theory and Algorithms, V21, P104
[9]   Static scheduling algorithms for allocating directed task graphs to multiprocessors [J].
Kwok, YK ;
Ahmad, I .
ACM COMPUTING SURVEYS, 1999, 31 (04) :406-471
[10]   CLASS OF NONLINEAR INTEGER PROGRAMS SOLVABLE BY A SINGLE LINEAR PROGRAM [J].
MEYER, RR .
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 1977, 15 (06) :935-946