A fault tolerant MPI-10 implementation using the expand parallel file system

被引:2
|
作者
Calderón, A [1 ]
García-Carballeira, F [1 ]
Carretero, J [1 ]
Pérez, JM [1 ]
Sánchez, LM [1 ]
机构
[1] Univ Carlos III Madrid, Dept Comp Sci, Comp Architecture Grp, Madrid, Spain
来源
13TH EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, PROCEEDINGS | 2005年
关键词
parallel file system; NFS; data declustering; clusters; fault-tolerance;
D O I
10.1109/EMPDP.2005.3
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Parallelism in file systems is obtained by using several independent server nodes supporting one or more secondary storage devices. This approach increases the performance and scalability of the system, but a fault in one single node can stop the whole system. To avoid this problem, data must be stored using some kind of redundant technique, so any data stored in a faulty element can be recovered. Fault tolerance can be provided in 1/O systems using replication or RAID based schemes. However, most of the current systems apply the same technique for all files in the system. (1) This paper describes the fault tolerance support provided by Expand, a parallel file system based on standard servers. Expand allows to define different fault-tolerant mechanisms at file level. The evaluation compare the performance of Expand with different configurations with PVFS using the FLASH-1/O benchmark.
引用
收藏
页码:274 / 281
页数:8
相关论文
共 50 条
  • [41] Application Level Fault Recovery: Using Fault-Tolerant Open MPI in a PDE Solver
    Ali, Md Mohsin
    Southern, James
    Strazdins, Peter
    Harding, Brendan
    PROCEEDINGS OF 2014 IEEE INTERNATIONAL PARALLEL & DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2014, : 1170 - 1179
  • [42] Fault tolerant QR-decomposition algorithm and its parallel implementation
    Maslennikow, O
    Kaniewski, J
    Wyrzykowski, R
    EURO-PAR '98 PARALLEL PROCESSING, 1998, 1470 : 798 - 803
  • [43] IMPLEMENTATION OF A FAULT TOLERANT DISTRIBUTED CONTROL-SYSTEM
    TRIPP, RP
    HUBBY, RN
    ISA TRANSACTIONS, 1991, 30 (04) : 33 - 43
  • [44] Implementation of intelligent active fault tolerant control system
    Postalcioglu, Seda
    Erkan, Kadir
    Bolat, Emine Dogru
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS: KES 2007 - WIRN 2007, PT I, PROCEEDINGS, 2007, 4692 : 804 - +
  • [45] Design and implementation of the attempto fault-tolerant system
    Gunter, Willi
    Computer Systems Science and Engineering, 1993, 8 (02): : 101 - 108
  • [46] IMPLEMENTATION OF AN EXPERIMENTAL FAULT-TOLERANT MEMORY SYSTEM
    CARTER, WC
    MCCARTHY, CE
    IEEE TRANSACTIONS ON COMPUTERS, 1976, 25 (06) : 557 - 568
  • [47] DESIGN AND IMPLEMENTATION OF THE ATTEMPTO FAULT-TOLERANT SYSTEM
    GUNTER, W
    COMPUTING SYSTEMS, 1993, 8 (02): : 101 - 108
  • [48] Parallel I/O Prefetching Using MPI File Caching and I/O Signatures
    Byna, Surendra
    Chen, Yong
    Sun, Xian-He
    Thakur, Rajeev
    Gropp, William
    INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2008, : 350 - +
  • [49] Implementation and evaluation of prefetching in the Intel Paragon parallel file system
    Arunachalam, M
    Choudhary, A
    Rullman, B
    10TH INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM - PROCEEDINGS OF IPPS '96, 1996, : 554 - 559
  • [50] High Performance and Fault Tolerant Distributed File System for Big Data Storage and Processing using Hadoop
    Sivaraman, E.
    Manickachezian, R.
    2014 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING APPLICATIONS (ICICA 2014), 2014, : 32 - 36