Dynamic erasure coding decision for modern block-oriented distributed storage systems

被引:3
|
作者
Ahn, Hoo-Young [1 ]
Lee, Kyong-Ha [2 ]
Lee, Yoon-Joon [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Sch Comp, 291 Daehak Ro, Taejon 305701, South Korea
[2] KISTI, Sci Data Res Ctr, 245 Daehak Ro, Daejeon 305806, South Korea
来源
JOURNAL OF SUPERCOMPUTING | 2016年 / 72卷 / 04期
关键词
Distributed storage system; Storage overhead; Hadoop; HDFS; Data replication; Erasure coding; RAID;
D O I
10.1007/s11227-016-1661-7
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Modern block-oriented distributed storage systems like Hadoop distributed file system have proliferated in this era of big data and cloud computing. These systems feature block-level replication in which their files are partitioned into equal-sized blocks and multiple copies for each block are then arbitrarily distributed across nodes for fault tolerance and data availability. However, many storage volumes are just wasted only for keeping block copies whose data may not be accessed frequently in the strategy. Therefore, distributed storage systems begin to adopt erasure codes. However, classical parity encoding scheme are hard to be directly applied to the distributed storage systems since block copies are arbitrarily placed across nodes in the systems. We present a novel technique, called DynaEC, to address the issues in modern block-oriented distributed storage systems. DynaEC provides a unique parity encoding algorithm that encodes data blocks arbitrarily distributed across machines to parities and then places the parities guaranteeing fault tolerance. Parity encoding in DynaEC is performed without any change of the original block placement policy in Hadoop distributed file system. This makes DynaEC work seamlessly with Hadoop distributed file system. Finally, during the encoding procedure each data node encodes each own data blocks, not requiring any information about other blocks located in other data nodes. As such, the encoding procedure in DynaEC is fully performed in parallel without any synchronization issue. With extensive experiments, we show that DynaEC saves storage volumes up to the theoretical limit while outperforming previous approaches by multiple orders of magnitude.
引用
收藏
页码:1312 / 1341
页数:30
相关论文
共 50 条
  • [21] Dynamic-EC: an efficient dynamic erasure coding method for permissioned blockchain systems
    Zhang, Mizhipeng
    Wu, Chentao
    Li, Jie
    Guo, Minyi
    FRONTIERS OF COMPUTER SCIENCE, 2025, 19 (01)
  • [22] A Data Layout and Fast Failure Recovery Scheme for Distributed Storage Systems With Mixed Erasure Codes
    Xu, Liangliang
    Lyu, Min
    Li, Zhipeng
    Li, Cheng
    Xu, Yinlong
    IEEE TRANSACTIONS ON COMPUTERS, 2021, 71 (08) : 1740 - 1754
  • [23] A Rack-Aware Pipeline Repair Scheme for Erasure-Coded Distributed Storage Systems
    Liu, Tong
    Alibhai, Shakeel
    He, Xubin
    PROCEEDINGS OF THE 49TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2020, 2020,
  • [24] An In-network Aggregation Scheme for Erasure Coding Storage Systems in Data Centers
    Xia, Junxu
    Yao, Chendie
    Li, Jiangfan
    2018 SIXTH INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA (CBD), 2018, : 36 - 41
  • [25] Robot: An Efficient Model For Big Data Storage Systems Based On Erasure Coding
    Yin, Chao
    Wang, Jianzong
    Xie, Changsheng
    Wan, Jiguang
    Long, Changlin
    Bi, Wenjuan
    2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
  • [26] Improving Coding Performance and Energy Efficiency of Erasure Coding Process for Storage Systems - A Parallel and Scalable Approach
    Chen, Hsing-bung
    Fu, Song
    PROCEEDINGS OF 2016 IEEE 9TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD), 2016, : 933 - 936
  • [27] Mojette Transform based LDPC Erasure Correction Codes for Distributed Storage Systems
    Arslan, Suayb S.
    Parrein, Benoit
    Normand, Nicolas
    2017 25TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2017,
  • [28] Efficient techniques of parallel recovery for erasure-coding-based distributed file systems
    Dong-Oh Kim
    Hong-Yeon Kim
    Young-Kyun Kim
    Jeong-Joon Kim
    Computing, 2019, 101 : 1861 - 1884
  • [29] Efficient techniques of parallel recovery for erasure-coding-based distributed file systems
    Kim, Dong-Oh
    Kim, Hong-Yeon
    Kim, Young-Kyun
    Kim, Jeong-Joon
    COMPUTING, 2019, 101 (12) : 1861 - 1884
  • [30] Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding
    Hu, Yuchong
    Xu, Yinlong
    Wang, Xiaozhao
    Zhan, Cheng
    Li, Pei
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2010, 28 (02) : 268 - 276