Resource Aware Scheduling for EDA Regression Jobs

被引：0

作者：

Nanda, Saurav ^{[1
]}

Parthasarathy, Ganapathy ^{[1
]}

Choudhary, Parivesh ^{[1
]}

Venkatachar, Arun ^{[1
]}

机构：

[1] Synopsys Inc, Mountain View, CA 94043 USA

来源：

EURO-PAR 2019: PARALLEL PROCESSING WORKSHOPS | 2020年 / 11997卷

关键词：

Job scheduling; Machine learning; K-means; Adaptive binning; Regression testing; Electronic Design Automation;

D O I：

10.1007/978-3-030-48340-1_49

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Typical Integrated Circuit (IC) design projects use Electronic Design Automation (EDA) tool flows to launch thousands of regressions every day on shared compute grids to complete the IC design verification process. These regressions in turn launch compute jobs with varied resource requirements and inter-job dependency constraints. Traditional grid schedulers, such as the Univa Grid Engine (UGE) [12] prioritize fairness over performance to maximize the number of jobs run with equal distribution of resources at any time. A constant challenge in day-to-day operations is to schedule these jobs for minimum overall job completion time so that developers can expect predictable regression turn-around time (TAT). We propose a resource-aware scheduling mechanism that balances performance and fairness for real-word EDA-centric workloads. We present an analysis of historical profile information from a set of regressions with complex inter-job dependencies and highly variable resource requirements to show that many of these regression jobs are well suited for efficient packing on grid machines. We formulate the regression scheduling problem as a variant of the bin packing problem, where the size of bins and balls may vary according to job-resource requirements and differing server configurations on the grid. We propose using two analytic techniques - namely k-means clustering [8] and adaptive binning [10], to solve this problem. We then evaluate the performance of our proposed solution using real workloads from daily regressions on an enterprise compute grid.

引用

页码：639 / 651

页数：13

共 50 条

[11] Fregata: A Low-Latency and Resource-Efficient Scheduling for Heterogeneous Jobs in Clouds
Liu, Jinwei
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (IEEE BIGCOMP 2022), 2022, : 15 - 22
[12] Scheduling Jobs in Grids Adaptively
Chang, Ruay-Shiung
Lin, Chih-Yuan
Lin, Chun-Fu
[J]. 2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS, PROCEEDINGS, 2009, : 19 - 25
[13] Challenges and Opportunities of Security-Aware EDA
Feldtkeller, Jakob
Sasdrich, Pascal
Gueneysu, Tim
[J]. ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2023, 22 (03)
[14] A Two-Phase Energy-Aware Scheduling Approach for CPU-Intensive Jobs in Mobile Grids
Hirsch, Matias
Manuel Rodriguez, Juan
Mateos, Cristian
Zunino, Alejandro
[J]. JOURNAL OF GRID COMPUTING, 2017, 15 (01) : 55 - 80
[15] A Two-Phase Energy-Aware Scheduling Approach for CPU-Intensive Jobs in Mobile Grids
Matías Hirsch
Juan Manuel Rodríguez
Cristian Mateos
Alejandro Zunino
[J]. Journal of Grid Computing, 2017, 15 : 55 - 80
[16] Scheduling Deep Learning Jobs in Multi-Tenant GPU Clusters via Wise Resource Sharing
Luo, Yizhou
Wang, Qiang
Shi, Shaohuai
Lai, Jiaxin
Qi, Shuhan
Zhang, Jiajia
Wang, Xuan
[J]. 2024 IEEE/ACM 32ND INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE, IWQOS, 2024,
[17] Performance and energy aware scheduling simulator for HPC: evaluating different resource selection methods
Gomez-Martin, Cesar
Vega-Rodriguez, Miguel A.
Gonzalez-Sanchez, Jose-Luis
[J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2015, 27 (17) : 5436 - 5459
[18] An algorithm for scheduling jobs in hypercube systems
Kwon, OH
Chwa, KY
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1998, 9 (09) : 856 - 860
[19] SCHEDULING JOBS WITH TEMPORAL DISTANCE CONSTRAINTS
HAN, CC
LIN, KJ
LIU, JWS
[J]. SIAM JOURNAL ON COMPUTING, 1995, 24 (05) : 1104 - 1121
[20] Liquid: Intelligent Resource Estimation and Network-Efficient Scheduling for Deep Learning Jobs on Distributed GPU Clusters
Gu, Rong
Chen, Yuquan
Liu, Shuai
Dai, Haipeng
Chen, Guihai
Zhang, Kai
Che, Yang
Huang, Yihua
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (11) : 2808 - 2820

← 1 2 3 4 5 →