Fault-tolerant parallel scheduling of tasks on a heterogeneous high-performance workstation cluster

被引:2
|
作者
Kwok, YK [1 ]
机构
[1] Univ Hong Kong, Dept Elect & Elect Engn, Hong Kong, Hong Kong, Peoples R China
来源
JOURNAL OF SUPERCOMPUTING | 2001年 / 19卷 / 03期
关键词
parallel algorithms; cluster computing; heterogeneous systems; fault-tolerant scheduler; task graphs; neighborhood search;
D O I
10.1023/A:1011186732749
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a new approach, called cluster-based search (CBS), for scheduling large task graphs in parallel on a heterogeneous cluster of workstations connected by a high-speed network (e.g., using an ATM switch at OC-3 speed). The CBS algorithm uses a parallel random neighborhood search which works by refining multiple different initial schedules simultaneously using different workstations. The workstations communicate periodically to exchange their best solutions found thus far in order to direct the search to more promising regions in the search space. Heterogeneity of machines is exploited by the biased partitioning of the search space. The parallel random neighborhood search is fault-tolerant in that the workload of a failed workstation is automatically redistributed to other workstations so that the search can continue. We have implemented the CBS algorithm as a core function of our on-going development of SSI middleware for a Sun workstation cluster.
引用
收藏
页码:299 / 314
页数:16
相关论文
共 50 条
  • [11] QoS-Aware Fault-Tolerant Scheduling for Real-Time Tasks on Heterogeneous Clusters
    Zhu, Xiaomin
    Qin, Xiao
    Qiu, Meikang
    IEEE TRANSACTIONS ON COMPUTERS, 2011, 60 (06) : 800 - 812
  • [12] Real-time fault-tolerant scheduling algorithm of periodic tasks in heterogeneous distributed systems
    School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
    Jisuanji Xuebao, 2007, 10 (1740-1749):
  • [13] An Efficient Fault-tolerant Scheduling Algorithm for Periodic Real-time Tasks in Heterogeneous Platforms
    Qiu, Weiwei
    Zheng, Zibin
    Wang, Xinyu
    Yang, Xiaohu
    2013 IEEE 16TH INTERNATIONAL SYMPOSIUM ON OBJECT/COMPONENT/SERVICE-ORIENTED REAL-TIME DISTRIBUTED COMPUTING (ISORC), 2013,
  • [14] Fault-Tolerant Online Packet Scheduling on Parallel Channels
    Garncarek, Pawel
    Jurdzinski, Tomasz
    Lorys, Krzysztof
    2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2017, : 347 - 356
  • [15] Scheduling the tasks of multiple AGVs in a fault-tolerant control way
    Majdzik, P.
    Witczak, M.
    Mrugalski, M.
    IFAC PAPERSONLINE, 2023, 56 (02): : 150 - 155
  • [16] Fault-tolerant scheduling for Bag-of-Tasks Grid applications
    Anglano, C
    Canonico, M
    ADVANCES IN GRID COMPUTING - EGC 2005, 2005, 3470 : 630 - 639
  • [17] Fault-tolerant scheduling
    Kalyanasundaram, B
    Pruhs, KR
    SIAM JOURNAL ON COMPUTING, 2005, 34 (03) : 697 - 719
  • [18] Fault-tolerant real-time tasks scheduling with dynamic fault handling
    Chen, Gang
    Guan, Nan
    Huang, Kai
    Yi, Wang
    JOURNAL OF SYSTEMS ARCHITECTURE, 2020, 102 (102)
  • [19] High-performance fault-tolerant CORDIC processor for space applications
    Wang, Sicong
    Wen, Zhiping
    Yu, Lixin
    ISSCAA 2006: 1ST INTERNATIONAL SYMPOSIUM ON SYSTEMS AND CONTROL IN AEROSPACE AND ASTRONAUTICS, VOLS 1AND 2, 2006, : 360 - +
  • [20] A high-performance application protocol for fault-tolerant CAN networks
    Bertoluzzo, Manuele
    Buja, Giuseppe
    IEEE INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS (ISIE 2010), 2010, : 1705 - 1710