ART: Robustness of meshes and tori for parallel and distributed computation

被引:0
|
作者
Yeh, CH [1 ]
Parhami, B [1 ]
机构
[1] Queens Univ, Dept Elect & Comp Engn, Kingston, ON K7L 3N6, Canada
关键词
D O I
10.1109/ICPP.2002.1040903
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we formulate the array robustness theorems (ARTs) for efficient computation and communication on faulty arrays. No hardware redundancy is required and no assumption is made about the availability of a complete submesh or subtorus. Based on ARTs, a very wide variety of problems, including sorting, FFT, total exchange, permutation, and some matrix operations, can be solved with a slowdown factor of 1 + o (1). The number of faults tolerated by ARTs ranges from o(min(n(1-1/d), (n)/(d), (n)/(h))) for n-ary d-cubes with worst-case faults to as large as o(N) for most N-node 2-D meshes or tori with random faults, where h is the number of data items per processor The resultant running times are the best results reported thus far for solving many problems on faulty arrays. Based on ARTs and several other components such as robust libraries, the priority emulation discipline, and X'Y' routing, we introduce the robust adaptation interface layer (RAIL) as a middleware between ordinary algorithms/programs (that are originally developed for fault-free arrays) and the faulty network/hardware. In effect, RAIL provides a virtual fault-free network to higher layers, while ordinary algorithms/programs are transformed through RAIL into corresponding robust algorithms/programs that can run on faulty networks.
引用
收藏
页码:463 / 472
页数:10
相关论文
共 50 条
  • [1] Distributed submesh determination in faulty tori and meshes
    Chen, HL
    Hu, SH
    11TH INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM, PROCEEDINGS, 1997, : 65 - 70
  • [2] Parallel Homology Computation of Meshes
    Damiand, Guillaume
    Gonzalez-Diaz, Rocio
    COMPUTATIONAL TOPOLOGY IN IMAGE CONTEXT, CTIC 2016, 2016, 9667 : 53 - 64
  • [3] Massively parallel computation on anisotropic meshes
    Digonnet, H.
    Silva, L.
    Coupez, T.
    Adaptive Modeling and Simulation 2013 - Proceedings of the 6th International Conference on Adaptive Modeling and Simulation, ADMOS 2013, 2013, : 199 - 211
  • [4] On the Computation of Reducible Invariant Tori on a Parallel Computer
    Jorba, Angel
    Olmedo, Estrella
    SIAM JOURNAL ON APPLIED DYNAMICAL SYSTEMS, 2009, 8 (04): : 1382 - 1404
  • [5] Gossiping on meshes and tori
    Juurlink, BHH
    Sibeyn, JF
    Rao, PS
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1998, 9 (06) : 513 - 525
  • [6] Distributed parallel computation in DP grid system - art. no. 678920
    Sun, Mingwei
    Zhang, Jianqing
    Chen, Yanan
    MIPPR 2007: MEDICAL IMAGING, PARALLEL PROCESSING OF IMAGES, AND OPTIMIZATION TECHNIQUES, 2007, 6789 : 78920 - 78920
  • [7] Distributed parallel Grobner bases computation
    Kredel, Heinz
    CISIS: 2009 INTERNATIONAL CONFERENCE ON COMPLEX, INTELLIGENT AND SOFTWARE INTENSIVE SYSTEMS, VOLS 1 AND 2, 2009, : 518 - 524
  • [8] EMBEDDINGS AMONG MESHES AND TORI
    MA, E
    TAO, LX
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1993, 18 (01) : 44 - 55
  • [9] Distributed and Parallel Computation of the Canonical Direct Basis
    Viaud, Jean-Francois
    Bertet, Karell
    Missaoui, Rokia
    Demko, Christophe
    FORMAL CONCEPT ANALYSIS, ICFCA 2017, 2017, 10308 : 228 - 241
  • [10] The research of distributed parallel computation based JXTA
    Liu, Chun
    Guo, Qingping
    DCABES 2007 Proceedings, Vols I and II, 2007, : 875 - 877