Dynamic erasure coding decision for modern block-oriented distributed storage systems

被引:3
作者
Ahn, Hoo-Young [1 ]
Lee, Kyong-Ha [2 ]
Lee, Yoon-Joon [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Sch Comp, 291 Daehak Ro, Taejon 305701, South Korea
[2] KISTI, Sci Data Res Ctr, 245 Daehak Ro, Daejeon 305806, South Korea
关键词
Distributed storage system; Storage overhead; Hadoop; HDFS; Data replication; Erasure coding; RAID;
D O I
10.1007/s11227-016-1661-7
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Modern block-oriented distributed storage systems like Hadoop distributed file system have proliferated in this era of big data and cloud computing. These systems feature block-level replication in which their files are partitioned into equal-sized blocks and multiple copies for each block are then arbitrarily distributed across nodes for fault tolerance and data availability. However, many storage volumes are just wasted only for keeping block copies whose data may not be accessed frequently in the strategy. Therefore, distributed storage systems begin to adopt erasure codes. However, classical parity encoding scheme are hard to be directly applied to the distributed storage systems since block copies are arbitrarily placed across nodes in the systems. We present a novel technique, called DynaEC, to address the issues in modern block-oriented distributed storage systems. DynaEC provides a unique parity encoding algorithm that encodes data blocks arbitrarily distributed across machines to parities and then places the parities guaranteeing fault tolerance. Parity encoding in DynaEC is performed without any change of the original block placement policy in Hadoop distributed file system. This makes DynaEC work seamlessly with Hadoop distributed file system. Finally, during the encoding procedure each data node encodes each own data blocks, not requiring any information about other blocks located in other data nodes. As such, the encoding procedure in DynaEC is fully performed in parallel without any synchronization issue. With extensive experiments, we show that DynaEC saves storage volumes up to the theoretical limit while outperforming previous approaches by multiple orders of magnitude.
引用
收藏
页码:1312 / 1341
页数:30
相关论文
共 50 条
  • [41] Repair Tree: Fast Repair for Single Failure in Erasure-Coded Distributed Storage Systems
    Zhang, Huayu
    Li, Hui
    Li, Shuo-Yen Robert
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (06) : 1728 - 1739
  • [42] Cache-Based Matrix Technology for Efficient Write and Recovery in Erasure Coding Distributed File Systems
    Shin, Dong-Jin
    Kim, Jeong-Joon
    SYMMETRY-BASEL, 2023, 15 (04):
  • [43] EC-Store: Bridging the Gap Between Storage and Latency in Distributed Erasure Coded Systems
    Abebe, Michael
    Daudjee, Khuzaima
    Glasbergen, Brad
    Tian, Yuanfeng
    2018 IEEE 38TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS), 2018, : 255 - 266
  • [44] An Ant Colony Optimization Based Data Update Scheme for Distributed Erasure-Coded Storage Systems
    Hu, Yupeng
    Li, Qian
    Xie, Wei
    Ye, Zhenyu
    IEEE ACCESS, 2020, 8 : 118696 - 118706
  • [45] Tree-structured Data Regeneration with Network Coding in Distributed Storage Systems
    Li, Jun
    Yang, Shuang
    Wang, Xin
    Xue, Xiangyang
    Li, Baochun
    IWQOS: 2009 IEEE 17TH INTERNATIONAL WORKSHOP ON QUALITY OF SERVICE, 2009, : 19 - +
  • [46] HSPP: Load-Balanced and Low-Latency File Partition and Placement Strategy on Distributed Heterogeneous Storage with Erasure Coding
    Sun, Jiazhao
    Li, Yunchun
    Yang, Hailong
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2019, PT II, 2020, 11945 : 200 - 214
  • [47] Tree-Structured Parallel Regeneration for Multiple Data Losses in Distributed Storage Systems Based on Erasure Codes
    Sun Weidong
    Wang Yijie
    Pei Xiaoqiang
    CHINA COMMUNICATIONS, 2013, 10 (04) : 113 - 125
  • [48] TERS: a traffic efficient repair scheme for repairing multiple losses in erasure-coded distributed storage systems
    Zheng, LiMing
    Wang, Xu'an
    Tian, XiaoBo
    Li, XiaoDong
    INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2018, 16 (03) : 242 - 254
  • [49] The Design and Implementation of Random Linear Network Coding Based Distributed Storage System in Dynamic Networks
    He, Bin
    Wang, Jin
    Zhou, Jingya
    Lu, Kejie
    Li, Lingzhi
    Zhang, Shukui
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2018, PT IV, 2018, 11337 : 72 - 82
  • [50] EC-Fusion: An Efficient Hybrid Erasure Coding Framework to Improve Both Application and Recovery Performance in Cloud Storage Systems
    Qiu, Han
    Wu, Chentao
    Li, Jie
    Guo, Minyi
    Liu, Tong
    He, Xubin
    Dong, Yuanyuan
    Zhao, Yafei
    2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM IPDPS 2020, 2020, : 191 - 201