Resource Aware Scheduling for EDA Regression Jobs

被引:0
作者
Nanda, Saurav [1 ]
Parthasarathy, Ganapathy [1 ]
Choudhary, Parivesh [1 ]
Venkatachar, Arun [1 ]
机构
[1] Synopsys Inc, Mountain View, CA 94043 USA
来源
EURO-PAR 2019: PARALLEL PROCESSING WORKSHOPS | 2020年 / 11997卷
关键词
Job scheduling; Machine learning; K-means; Adaptive binning; Regression testing; Electronic Design Automation;
D O I
10.1007/978-3-030-48340-1_49
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Typical Integrated Circuit (IC) design projects use Electronic Design Automation (EDA) tool flows to launch thousands of regressions every day on shared compute grids to complete the IC design verification process. These regressions in turn launch compute jobs with varied resource requirements and inter-job dependency constraints. Traditional grid schedulers, such as the Univa Grid Engine (UGE) [12] prioritize fairness over performance to maximize the number of jobs run with equal distribution of resources at any time. A constant challenge in day-to-day operations is to schedule these jobs for minimum overall job completion time so that developers can expect predictable regression turn-around time (TAT). We propose a resource-aware scheduling mechanism that balances performance and fairness for real-word EDA-centric workloads. We present an analysis of historical profile information from a set of regressions with complex inter-job dependencies and highly variable resource requirements to show that many of these regression jobs are well suited for efficient packing on grid machines. We formulate the regression scheduling problem as a variant of the bin packing problem, where the size of bins and balls may vary according to job-resource requirements and differing server configurations on the grid. We propose using two analytic techniques - namely k-means clustering [8] and adaptive binning [10], to solve this problem. We then evaluate the performance of our proposed solution using real workloads from daily regressions on an enterprise compute grid.
引用
收藏
页码:639 / 651
页数:13
相关论文
共 50 条
  • [11] Fregata: A Low-Latency and Resource-Efficient Scheduling for Heterogeneous Jobs in Clouds
    Liu, Jinwei
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (IEEE BIGCOMP 2022), 2022, : 15 - 22
  • [12] Scheduling Jobs in Grids Adaptively
    Chang, Ruay-Shiung
    Lin, Chih-Yuan
    Lin, Chun-Fu
    [J]. 2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS, PROCEEDINGS, 2009, : 19 - 25
  • [13] Challenges and Opportunities of Security-Aware EDA
    Feldtkeller, Jakob
    Sasdrich, Pascal
    Gueneysu, Tim
    [J]. ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2023, 22 (03)
  • [14] A Two-Phase Energy-Aware Scheduling Approach for CPU-Intensive Jobs in Mobile Grids
    Hirsch, Matias
    Manuel Rodriguez, Juan
    Mateos, Cristian
    Zunino, Alejandro
    [J]. JOURNAL OF GRID COMPUTING, 2017, 15 (01) : 55 - 80
  • [15] A Two-Phase Energy-Aware Scheduling Approach for CPU-Intensive Jobs in Mobile Grids
    Matías Hirsch
    Juan Manuel Rodríguez
    Cristian Mateos
    Alejandro Zunino
    [J]. Journal of Grid Computing, 2017, 15 : 55 - 80
  • [16] Scheduling Deep Learning Jobs in Multi-Tenant GPU Clusters via Wise Resource Sharing
    Luo, Yizhou
    Wang, Qiang
    Shi, Shaohuai
    Lai, Jiaxin
    Qi, Shuhan
    Zhang, Jiajia
    Wang, Xuan
    [J]. 2024 IEEE/ACM 32ND INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE, IWQOS, 2024,
  • [17] Performance and energy aware scheduling simulator for HPC: evaluating different resource selection methods
    Gomez-Martin, Cesar
    Vega-Rodriguez, Miguel A.
    Gonzalez-Sanchez, Jose-Luis
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2015, 27 (17) : 5436 - 5459
  • [18] An algorithm for scheduling jobs in hypercube systems
    Kwon, OH
    Chwa, KY
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1998, 9 (09) : 856 - 860
  • [19] SCHEDULING JOBS WITH TEMPORAL DISTANCE CONSTRAINTS
    HAN, CC
    LIN, KJ
    LIU, JWS
    [J]. SIAM JOURNAL ON COMPUTING, 1995, 24 (05) : 1104 - 1121
  • [20] Liquid: Intelligent Resource Estimation and Network-Efficient Scheduling for Deep Learning Jobs on Distributed GPU Clusters
    Gu, Rong
    Chen, Yuquan
    Liu, Shuai
    Dai, Haipeng
    Chen, Guihai
    Zhang, Kai
    Che, Yang
    Huang, Yihua
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (11) : 2808 - 2820