Dynamic erasure coding decision for modern block-oriented distributed storage systems

被引:3
作者
Ahn, Hoo-Young [1 ]
Lee, Kyong-Ha [2 ]
Lee, Yoon-Joon [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Sch Comp, 291 Daehak Ro, Taejon 305701, South Korea
[2] KISTI, Sci Data Res Ctr, 245 Daehak Ro, Daejeon 305806, South Korea
关键词
Distributed storage system; Storage overhead; Hadoop; HDFS; Data replication; Erasure coding; RAID;
D O I
10.1007/s11227-016-1661-7
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Modern block-oriented distributed storage systems like Hadoop distributed file system have proliferated in this era of big data and cloud computing. These systems feature block-level replication in which their files are partitioned into equal-sized blocks and multiple copies for each block are then arbitrarily distributed across nodes for fault tolerance and data availability. However, many storage volumes are just wasted only for keeping block copies whose data may not be accessed frequently in the strategy. Therefore, distributed storage systems begin to adopt erasure codes. However, classical parity encoding scheme are hard to be directly applied to the distributed storage systems since block copies are arbitrarily placed across nodes in the systems. We present a novel technique, called DynaEC, to address the issues in modern block-oriented distributed storage systems. DynaEC provides a unique parity encoding algorithm that encodes data blocks arbitrarily distributed across machines to parities and then places the parities guaranteeing fault tolerance. Parity encoding in DynaEC is performed without any change of the original block placement policy in Hadoop distributed file system. This makes DynaEC work seamlessly with Hadoop distributed file system. Finally, during the encoding procedure each data node encodes each own data blocks, not requiring any information about other blocks located in other data nodes. As such, the encoding procedure in DynaEC is fully performed in parallel without any synchronization issue. With extensive experiments, we show that DynaEC saves storage volumes up to the theoretical limit while outperforming previous approaches by multiple orders of magnitude.
引用
收藏
页码:1312 / 1341
页数:30
相关论文
共 50 条
  • [31] EFLOG: A Full Stream-Logging Scheme with Erasure Coding in Cloud Storage Systems
    Sun, Lei
    Cao, Qiang
    Wang, Shucheng
    Xie, Changsheng
    2021 IEEE INTERNATIONAL CONFERENCE ON NETWORKING, ARCHITECTURE AND STORAGE (NAS), 2021, : 219 - 222
  • [32] MEC: Network Optimized Multi-stage Erasure Coding for Scalable Storage Systems
    Akutsu, Hiroaki
    Yamamoto, Takahiro
    Ueda, Kazunori
    Saito, Hideo
    2017 IEEE 22ND PACIFIC RIM INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING (PRDC 2017), 2017, : 292 - 300
  • [33] Provisioning CSD-based Storage Systems with Erasure-coding Offloaded to the CSD
    Byun, Hongsu
    Jamil, Safdar
    Ryu, Junghyun
    Park, Sungyong
    Lee, Myungcheol
    Park, Sung-Soon
    Kim, Youngjae
    JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, 2024, 24 (01) : 8 - 17
  • [34] A Secure and Efficient Distributed Storage Scheme SAONT-RS Based on an improved AONT and Erasure Coding
    Yao, Lili
    Lu, Jintian
    Liu, Jiabing
    Wang, Dejun
    Meng, Bo
    IEEE ACCESS, 2018, 6 : 55126 - 55138
  • [35] CassandrEAS: Highly Available and Storage-Efficient Distributed Key-Value Store with Erasure Coding
    Cadambe, Viveck R.
    Konwar, Kishori M.
    Medard, Muriel
    Pan, Haochen
    Tseng, Lewis
    Wu, Yingjian
    2020 IEEE 19TH INTERNATIONAL SYMPOSIUM ON NETWORK COMPUTING AND APPLICATIONS (NCA), 2020,
  • [36] Z codes: General Systematic Erasure Codes with Optimal Repair Bandwidth and Storage for Distributed Storage Systems
    Liu, Qing
    Feng, Dan
    Jiang, Hong
    Hu, Yuchong
    Jiao, Tianfeng
    2015 IEEE 34TH SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS (SRDS), 2015, : 212 - 217
  • [37] A New Adaptive Coding Selection Method for Distributed Storage Systems
    Wei, Bing
    Xiao, Li-Min
    Wei, Wei
    Song, Yao
    Zhou, Bing-Yu
    IEEE ACCESS, 2018, 6 : 13350 - 13357
  • [38] Reliability and Failure Impact Analysis of Distributed Storage Systems with Dynamic Refuging
    Akutsu, Hiroaki
    Ueda, Kazunori
    Chiba, Takeru
    Kawaguchi, Tomohiro
    Shimozono, Norio
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (09): : 2259 - 2268
  • [39] Reliability Analysis of Highly Redundant Distributed Storage Systems with Dynamic Refuging
    Akutsu, Hiroaki
    Ueda, Kazunori
    Chiba, Takeru
    Kawaguchi, Tomohiro
    Shimozono, Norio
    23RD EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2015), 2015, : 261 - 268
  • [40] TREAT - Two wRongs makE A righT: efficient distributed storage and queries of loT datasets with erasure coding and compression
    Taurone, Francesco
    Feher, Marcell
    Sipos, Marton
    Lucani, Daniel E.
    PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON DISTRIBUTED AND EVENT-BASED SYSTEMS, DEBS 2024, 2024, : 147 - 158