Fault-tolerant parallel scheduling of tasks on a heterogeneous high-performance workstation cluster

被引:2
作者
Kwok, YK [1 ]
机构
[1] Univ Hong Kong, Dept Elect & Elect Engn, Hong Kong, Hong Kong, Peoples R China
关键词
parallel algorithms; cluster computing; heterogeneous systems; fault-tolerant scheduler; task graphs; neighborhood search;
D O I
10.1023/A:1011186732749
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a new approach, called cluster-based search (CBS), for scheduling large task graphs in parallel on a heterogeneous cluster of workstations connected by a high-speed network (e.g., using an ATM switch at OC-3 speed). The CBS algorithm uses a parallel random neighborhood search which works by refining multiple different initial schedules simultaneously using different workstations. The workstations communicate periodically to exchange their best solutions found thus far in order to direct the search to more promising regions in the search space. Heterogeneity of machines is exploited by the biased partitioning of the search space. The parallel random neighborhood search is fault-tolerant in that the workload of a failed workstation is automatically redistributed to other workstations so that the search can continue. We have implemented the CBS algorithm as a core function of our on-going development of SSI middleware for a Sun workstation cluster.
引用
收藏
页码:299 / 314
页数:16
相关论文
共 46 条
[31]   Poster: Easy PRAM-based High-performance Parallel Programming with ICE [J].
Ghanim, Fady ;
Barua, Rajeev ;
Vishkin, Uzi .
2016 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION TECHNIQUES (PACT), 2016, :419-420
[32]   Fast and accurate RCS evaluation via high-performance parallel FDTD simulation [J].
Zhou, Xiao Long ;
Wang, Xin Yu ;
Zhang, Jian Feng ;
You, Jian Wei .
JOURNAL OF ENGINEERING-JOE, 2019, 2019 (21) :7322-7325
[33]   Predictive Resource Management for Next-Generation High-Performance Computing Heterogeneous Platforms [J].
Massari, Giuseppe ;
Pupykina, Anna ;
Agosta, Giovanni ;
Fornaciari, William .
EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION, SAMOS 2019, 2019, 11733 :470-483
[34]   Assessment of the parallelization approach of d2_cluster for high-performance sequence clustering [J].
Carpenter, JE ;
Christoffels, A ;
Weinbach, Y ;
Hide, WA .
JOURNAL OF COMPUTATIONAL CHEMISTRY, 2002, 23 (07) :755-757
[35]   Design and initial performance of a high-level unstructured mesh framework on heterogeneous parallel systems [J].
Mudalige, G. R. ;
Giles, M. B. ;
Thiyagalingam, J. ;
Reguly, I. Z. ;
Bertolli, C. ;
Kelly, P. H. J. ;
Trefethen, A. E. .
PARALLEL COMPUTING, 2013, 39 (11) :669-692
[36]   DASH: Deadline-Aware High-Performance Memory Scheduler for Heterogeneous Systems with Hardware Accelerators [J].
Usui, Hiroyuki ;
Subramanian, Lavanya ;
Chang, Kevin Kai-Wei ;
Mutlu, Onur .
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2016, 12 (04)
[37]   A high-performance cluster computing environment based on hybrid shared memory message passing model [J].
Ohnishi, Y ;
Sugimoto, Y ;
Sueyoshi, T .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1997, E80D (04) :448-454
[38]   ` Parallel Algorithm of SOI Layout Decomposition for Double Patterning Lithography on High-Performance Computer Platforms [J].
Verstov, Vladimir ;
Shakhnov, Vadim ;
Zinchenko, Lyudmila .
TECHNOLOGICAL INNOVATION FOR COLLECTIVE AWARENESS SYSTEMS, 2014, 423 :543-550
[39]   Parallel reduced-order modeling for digital twins using high-performance computing workflows [J].
de Parga, S. Ares ;
Bravo, J. R. ;
Sibuet, N. ;
Hernandez, J. A. ;
Rossi, R. ;
Boschert, Stefan ;
Quintana-Orti, Enrique S. ;
Tomas, Andres E. ;
Tatu, Cristian Catalin ;
Vazquez-Novoa, Fernando ;
Ejarque, Jorge ;
Badia, Rosa M. .
COMPUTERS & STRUCTURES, 2025, 316
[40]   Asynchronous Transfer Mode and other Network Technologies for Wide-Area and High-Performance Cluster Computing [J].
K. A. Hawick ;
H. A. James .
The Journal of Supercomputing, 2001, 19 :285-297