Coding for high availability of a distributed-parallel storage system

被引:12
作者
Malluhi, QM
Johnston, WE
机构
[1] Jackson State Univ, Dept Comp Sci, Jackson, MS 39217 USA
[2] Ernesto Orlando Lawrence Berkeley Natl Lab, Informat & Comp Sci Div, Berkeley, CA 94720 USA
关键词
storage systems; availability; scalability; RAID; high performance; distributed systems; error-correcting codes;
D O I
10.1109/71.737699
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We have developed a distributed parallel storage system that employs the aggregate bandwidth of multiple data servers connected by a high-speed wide-area network to achieve scalability and high data throughput. This paper studies different schemes to enhance the reliability and availability of such network-based distributed storage systems. The general approach of this paper employs "erasure" error-correcting codes that can be used to reconstruct missing information caused by hardware, software, or human faults. The paper describes the approach and develops optimized algorithms for the encoding and decoding operations. Moreover, the paper presents techniques for reducing the communication and computation overhead incurred while reconstructing missing data from the redundant information. These techniques include clustering, multidimensional coding, and the full two-dimensional parity schemes. The paper considers trade-offs between redundancy, fault tolerance, and complexity of error recovery.
引用
收藏
页码:1237 / 1252
页数:16
相关论文
共 20 条
  • [1] EVENODD - AN EFFICIENT SCHEME FOR TOLERATING DOUBLE-DISK FAILURES IN RAID ARCHITECTURES
    BLAUM, M
    BRADY, J
    BRUCK, J
    MENON, J
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 1995, 44 (02) : 192 - 202
  • [2] CHEN LT, 1993, P 19 VLDB VER LARG D
  • [3] Clark G.C., 1981, Error-Correction Coding for Digital Communications
  • [4] Deswarte Y., 1991, Proceedings. 1991 IEEE Computer Society Symposium on Research in Security and Privacy (Cat. No.91CH2986-8), P110, DOI 10.1109/RISP.1991.130780
  • [5] Fujiwara E., 1989, ERROR CONTROL CODING
  • [6] GHANDEHARIZADEH S, 1991, PROC INT CONF VERY L, P243
  • [7] CONTINUOUS RETRIEVAL OF MULTIMEDIA DATA USING PARALLELISM
    GHANDEHARIZADEH, S
    RAMOS, L
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1993, 5 (04) : 658 - 669
  • [8] HARTMAN JH, 1992, P USENIX WORKSH FIL
  • [9] Jacobson V., 1992, Technical Report
  • [10] LAMPARTER B, 1993, TR93009 INT COMP SCI