Fault-tolerant parallel scheduling of tasks on a heterogeneous high-performance workstation cluster

被引:2
作者
Kwok, YK [1 ]
机构
[1] Univ Hong Kong, Dept Elect & Elect Engn, Hong Kong, Hong Kong, Peoples R China
关键词
parallel algorithms; cluster computing; heterogeneous systems; fault-tolerant scheduler; task graphs; neighborhood search;
D O I
10.1023/A:1011186732749
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a new approach, called cluster-based search (CBS), for scheduling large task graphs in parallel on a heterogeneous cluster of workstations connected by a high-speed network (e.g., using an ATM switch at OC-3 speed). The CBS algorithm uses a parallel random neighborhood search which works by refining multiple different initial schedules simultaneously using different workstations. The workstations communicate periodically to exchange their best solutions found thus far in order to direct the search to more promising regions in the search space. Heterogeneity of machines is exploited by the biased partitioning of the search space. The parallel random neighborhood search is fault-tolerant in that the workload of a failed workstation is automatically redistributed to other workstations so that the search can continue. We have implemented the CBS algorithm as a core function of our on-going development of SSI middleware for a Sun workstation cluster.
引用
收藏
页码:299 / 314
页数:16
相关论文
共 46 条
[21]   High-Performance Passive Macromodeling Algorithms for Parallel Computing Platforms [J].
Chinea, Alessandro ;
Grivet-Talocia, Stefano ;
Olivadese, Salvatore Bernardo ;
Gobbato, Luca .
IEEE TRANSACTIONS ON COMPONENTS PACKAGING AND MANUFACTURING TECHNOLOGY, 2013, 3 (07) :1188-1203
[22]   A high performance algorithm for static task scheduling in heterogeneous distributed computing systems [J].
Daoud, Mohammad I. ;
Kharma, Nawwaf .
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2008, 68 (04) :399-409
[23]   An Evolutionary Technique for Performance-Energy-Temperature Optimized Scheduling of Parallel Tasks on Multi-Core Processors [J].
Sheikh, Hafiz Fahad ;
Ahmad, Ishfaq ;
Fan, Dongrui .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2016, 27 (03) :668-681
[24]   High-performance emulation of heterogeneous systems using adaptive time dilation [J].
Lee, Hee Won ;
Sichitiu, Mihail L. ;
Thuente, David .
INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2015, 29 (02) :166-183
[25]   High-performance parallel implementations of flow accumulation algorithms for multicore architectures [J].
Kotyra, Bartlomiej ;
Chabudzinski, Lukasz ;
Stpiczynski, Przemyslaw .
COMPUTERS & GEOSCIENCES, 2021, 151
[26]   Relaxations for High-Performance Message Passing on Massively Parallel SIMT Processors [J].
Klenk, Benjamin ;
Froening, Holger ;
Eberle, Hans ;
Dennison, Larry .
2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2017, :855-865
[27]   High-Performance Computing and Parallel Algorithms for Urban Water Demand Forecasting [J].
Myllis, Georgios ;
Tsimpiris, Alkiviadis ;
Aggelopoulos, Stamatios ;
Vrana, Vasiliki G. .
ALGORITHMS, 2025, 18 (04)
[28]   The High-Performance Parallel Algorithms for the Numerical Solution of Boundary Value Problems [J].
Volokhov, Vadim ;
Martynenko, Sergey ;
Toktaliev, Pavel ;
Yanovskiy, Leonid ;
Varlamov, Dmitriy ;
Volokhov, Alexander .
PARALLEL COMPUTATIONAL TECHNOLOGIES, PCT 2017, 2017, 753 :156-165
[29]   Performance optimization of High-Performance LINPACK based on GPU-centric model on heterogeneous systems [J].
Huang, Jiawen ;
Lu, Lu .
2019 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2019), 2019, :1371-1377
[30]   Ninf and PM: Communication libraries for global computing and high-performance cluster computing [J].
Sato, M ;
Tezuka, H ;
Hori, A ;
Ishikawa, Y ;
Sekiguchi, S ;
Nakada, H ;
Matsuoka, S ;
Nagashima, U .
FUTURE GENERATION COMPUTER SYSTEMS, 1998, 13 (4-5) :349-359