Fault tolerance in the WebCom metacomputer

被引:2
作者
Morrison, JP [1 ]
Kennedy, JJ [1 ]
Power, DA [1 ]
机构
[1] Natl Univ Ireland Univ Coll Cork, Dept Comp Sci, Cork, Ireland
来源
INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS, PROCEEDINGS | 2001年
关键词
fault tolerance; condensed graphs; metacomputing; distributed computing; WebCom;
D O I
10.1109/ICPPW.2001.951958
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper addresses fault tolerance in the WebCom metacomputer. WebCom's computation platform is dynamically reconfigurable and volunteer-based. Since its constituent machines may join and leave unpredictability, fault survival and efficient fault recovery is of paramount importance. A fault tolerance mechanism is outlined, which relies on a fast and efficient processor replacement procedure. It is shown that the characteristics of this procedure, together with the hierarchical and referentially transparent nature of WebCom executions, can be used to limit the affect of a fault to its immediate neighbourhood(1).
引用
收藏
页码:245 / 250
页数:4
相关论文
共 9 条
  • [1] CAPPELLO P, 1997, ACM WORKSH JAV SCI E
  • [2] DIETER WR, 1997, P 1997 IEEE AER C FE, V2, P525
  • [3] Hills AD, 1996, GEC REV, V11, P11
  • [4] KARUL M, 1998, THESIS NEW YORK U
  • [5] MORISON JP, 2001, TR0133 U COLL
  • [6] MORRISON J, 1996, THESIS EINDHOVEN
  • [7] WebCom: A Web based volunteer computer
    Morrison, JP
    Kennedy, JJ
    Power, DA
    [J]. JOURNAL OF SUPERCOMPUTING, 2001, 18 (01) : 47 - 61
  • [8] RAMKUMAR B, 1997, P 27 INT S FAULT TOL, P58
  • [9] SARMENTA LFG, 1998, ACM 1998 WORKSH JAV