A fault tolerant MPI-10 implementation using the expand parallel file system

被引:2
|
作者
Calderón, A [1 ]
García-Carballeira, F [1 ]
Carretero, J [1 ]
Pérez, JM [1 ]
Sánchez, LM [1 ]
机构
[1] Univ Carlos III Madrid, Dept Comp Sci, Comp Architecture Grp, Madrid, Spain
来源
13TH EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, PROCEEDINGS | 2005年
关键词
parallel file system; NFS; data declustering; clusters; fault-tolerance;
D O I
10.1109/EMPDP.2005.3
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Parallelism in file systems is obtained by using several independent server nodes supporting one or more secondary storage devices. This approach increases the performance and scalability of the system, but a fault in one single node can stop the whole system. To avoid this problem, data must be stored using some kind of redundant technique, so any data stored in a faulty element can be recovered. Fault tolerance can be provided in 1/O systems using replication or RAID based schemes. However, most of the current systems apply the same technique for all files in the system. (1) This paper describes the fault tolerance support provided by Expand, a parallel file system based on standard servers. Expand allows to define different fault-tolerant mechanisms at file level. The evaluation compare the performance of Expand with different configurations with PVFS using the FLASH-1/O benchmark.
引用
收藏
页码:274 / 281
页数:8
相关论文
共 50 条
  • [31] Design and implementation of a parallel algorithm for solving a special linear system using MPI and SHMEM
    Wankar, R
    Rainald, E
    Chaudhari, NS
    IETE JOURNAL OF RESEARCH, 2002, 48 (01) : 15 - 22
  • [32] On the Design and Implementation of a Simulator for Parallel File System Research
    Liu, Yonggang
    Figueiredo, Renato
    Xu, Yiqi
    Zhao, Ming
    2013 IEEE 29TH SYMPOSIUM ON MASS STORAGE SYSTEMS AND TECHNOLOGIES (MSST), 2013,
  • [33] Towards a High Performance Implementation of MPI-IO on the Lustre File System
    Dickens, Phillip
    Logan, Jeremy
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2008, PART I, 2008, 5331 : 870 - 885
  • [34] Fault Tolerant Parallel FFT Using Parallel Failure recovery
    Fu, Hongyi
    Yang, Xuejun
    PROCEEDINGS OF THE 2009 INTERNATIONAL CONFERENCE OF COMPUTATIONAL SCIENCES AND ITS APPLICATIONS, 2009, : 257 - 261
  • [35] A high performance implementation of MPI-IO for a Lustre file system environment
    Dickens, Phillip M.
    Logan, Jeremy
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2010, 22 (11): : 1433 - 1449
  • [36] Optimization of nonblocking MPI-I/O to a remote parallel virtual file system using a circular buffer
    Tsujita, Y
    HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, PROCEEDINGS, 2005, 3726 : 585 - 594
  • [37] Implementation of parallel collection equi-join using MPI
    Lee, NK
    Taniar, D
    Rahayu, JW
    Ashrafi, MZ
    APPLIED PARALLEL COMPUTING: ADVANCED SCIENTIFIC COMPUTING, 2002, 2367 : 217 - 226
  • [38] Parallel implementation of minimum spanning tree algorithms using MPI
    Loncar, Vladimir
    Skrbic, Srdjan
    13TH IEEE INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND INFORMATICS (CINTI 2012), 2012, : 35 - 38
  • [39] Fault-tolerant architecture and implementation of a distributed control system using containers
    Bernardino Tamanaka, Gustavo Teruo
    Aroca, Rafael Vidal
    de Paula Caurin, Glauco Augusto
    2022 LATIN AMERICAN ROBOTICS SYMPOSIUM (LARS), 2022 BRAZILIAN SYMPOSIUM ON ROBOTICS (SBR), AND 2022 WORKSHOP ON ROBOTICS IN EDUCATION (WRE), 2022, : 211 - 216
  • [40] Implementation of parallel collection equi-join using MPI
    Lee, NK
    Taniar, D
    Rahayu, JW
    Ashrafi, MZ
    APPLIED PARALLEL COMPUTING: ADVANCED SCIENTIFIC COMPUTING, 2002, 2367 : 217 - 226