Exploring Plan-Based Scheduling for Large-Scale Computing Systems

被引:9
|
作者
Zheng, Xingwu [1 ]
Zhou, Zhou [2 ]
Yang, Xu [2 ]
Lan, Zhiling [2 ]
Wang, Jia [1 ]
机构
[1] IIT, Dept Elect & Comp Engn, Chicago, IL 60616 USA
[2] IIT, Dept Comp Sci, Chicago, IL 60616 USA
来源
2016 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER) | 2016年
关键词
Plan-based scheduling; Simulated Annealing algorithm; Optimization;
D O I
10.1109/CLUSTER.2016.43
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As HPC systems scale toward exascale, it becomes critical to manage the underlying resource more effectively. While almost all existing resource management systems schedule jobs in a queuing fashion and have drawbacks of making isolated scheduling decisions that would compromise system performance even with backfilling, plan-based schedulers have the potential to generate better job schedules by producing an execution plan of all waiting jobs but do not receive enough attention. In this paper, we present a novel plan-based scheduling system that utilizes simulated annealing as the optimization engine to support effective resource management on HPC systems. As demonstrated by extensive trace-based simulations with workload traces collected from a wide range of production supercomputers, in comparison with the queue-based scheduling system using FCFS with EASY backfilling, our plan-based scheduling system can reduce the job wait time by 40%, reduce the job response time by 30%, while slightly improving system utilization at the same time. Moreover, our plan-based system is able to run online by solving the scheduling problem at each scheduling iteration within one second, making it practical for production HPC systems.
引用
收藏
页码:259 / 268
页数:10
相关论文
共 50 条
  • [41] TIME-STEPPED, SIMULATION-BASED SCHEDULING SYSTEM FOR LARGE-SCALE INDUSTRIAL CONSTRUCTION PROJECTS
    Hu, Di
    Mohamed, Yasser
    2013 WINTER SIMULATION CONFERENCE (WSC), 2013, : 3249 - 3256
  • [42] Quantum computing based hybrid solution strategies for large-scale discrete-continuous optimization problems
    Ajagekar, Akshay
    Humble, Travis
    You, Fengqi
    COMPUTERS & CHEMICAL ENGINEERING, 2020, 132
  • [43] Hierarchical Policy Iteration for Large-scale POMDP systems
    Jiang, Xiaofeng
    Ji, Zhe
    Xi, Hongsheng
    2015 34TH CHINESE CONTROL CONFERENCE (CCC), 2015, : 2401 - 2406
  • [44] A review of generation dispatch with large-scale photovoltaic systems
    Nghitevelekwa, K.
    Bansal, R. C.
    RENEWABLE & SUSTAINABLE ENERGY REVIEWS, 2018, 81 : 615 - 624
  • [45] Staffing large-scale service systems with distributional uncertainty
    Chen, Ying
    Hasenbein, John J.
    QUEUEING SYSTEMS, 2017, 87 (1-2) : 55 - 79
  • [46] A distributed algorithm for operating large-scale ridesourcing systems
    Zhang, Ruolin
    Masoud, Neda
    TRANSPORTATION RESEARCH PART E-LOGISTICS AND TRANSPORTATION REVIEW, 2021, 156
  • [47] THE APPLICATION OF LARGE-SCALE SYSTEMS OPTIMIZATION CONTROL ALGORITHMS
    BAKALIS, PS
    ELLIS, JE
    APPLIED MATHEMATICAL MODELLING, 1992, 16 (04) : 201 - 207
  • [48] Passivity-Based Decentralized Control for Discrete-Time Large-Scale Systems
    Aboudonia, Ahmed
    Martinelli, Andrea
    Lygeros, John
    IEEE CONTROL SYSTEMS LETTERS, 2021, 5 (06): : 2072 - 2077
  • [49] DECOMPOSITION-BASED MDSDO FOR CO-DESIGN OF LARGE-SCALE DYNAMIC SYSTEMS
    Behtash, Mohammad
    Alexander-Ramos, Michael J.
    PROCEEDINGS OF THE ASME INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, 2018, VOL 2A, 2018,
  • [50] Fuzzy-based interaction prediction approach for hierarchical control of large-scale systems
    Emamzadeh, Mohammad M.
    Sadati, Nasser
    Gruver, William A.
    FUZZY SETS AND SYSTEMS, 2017, 329 : 127 - 152