BPR: An Erasure Coding Batch Parallel Repair Approach in Distributed Storage Systems

被引：4

作者：

Song, Ying ^{[1
,2
,3
]}

Zhao, Wenxuan ^{[1
,2
]}

Wang, Bo ^{[4
]}

机构：

[1] Beijing Informat Sci & Technol Univ, Beijing Key Lab Internet Culture & Digital Dissemi, Beijing 100101, Peoples R China

[2] Beijing Informat Sci & Technol Univ, Beijing Adv Innovat Ctr Mat Genome Engn, Beijing 100101, Peoples R China

[3] Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing 100086, Peoples R China

[4] Zhengzhou Univ Light Ind, Software Engn Coll, Zhengzhou 450002, Peoples R China

来源：

IEEE ACCESS | 2023年 / 11卷

基金：

中国国家自然科学基金;

关键词：

Distributed processing; Business process re-engineering; Traffic congestion; Encoding; Full-duplex system; Decoding; Bandwidth; Storage management; Distributed storage system; erasure coding; data recovery; EFFICIENT; SCHEME; DESIGN; CODES;

D O I：

10.1109/ACCESS.2023.3257404

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Today, Erasure Coding is one of the most significant techniques widely used in distributed systems because it can improve reliability for large amounts of data with low storage overhead. However, when the distributed system encounters a large number of data loss in stripes and requires batch-stripes data recovery, current data recovery methods either repeat the single-stripe recovery method or only optimize partial stripe recovery when recovering large-scale stripes, which incurs heavy upload and download repair traffics and imbalanced load, affecting the efficiency of fault recovery and wasting additional resources. In this paper, we propose BPR, an Erasure Coding batch parallel repair approach for distributed storage systems. BPR reduces cross-rack network transfer time and increases recovery throughput by classifying the stripes and recovering the data of stripes in batches through the forward and reverse parallel data recovery. The experiment results show that for large-scale stripes recovery, BPR reduces the cross-rack network transfer time by up to 10% and increases the recovery throughput by up to 8% compared with the rPDL in some scenarios.

引用

页码：44509 / 44518

页数：10

共 21 条

[1]

[Anonymous], 2010, PROC 9 USENIX S OPER

[2] Fast Recovery Techniques for Erasure-coded Clusters in Non-uniform Traffic Network [J].

Bai, Yunren ;

Xu, Zihan ;

Wang, Haixia ;

Wang, Dongsheng .

PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP 2019), 2019,

[3]

Benson Theophilus, 2010, ACM Internet Measurement Conference (IMC), P267, DOI DOI 10.1145/1879141.1879175

[4]

Chowdhury M, 2016, 13TH USENIX SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION (NSDI '16), P407

[5] Leveraging Endpoint Flexibility in Data-Intensive Clusters [J].

Chowdhury, Mosharaf ;

Kandula, Srikanth ;

Stoica, Ion .

ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2013, 43 (04) :231-242

[6] Optimal Repair Layering for Erasure-Coded Data Centers: From Theory to Practice [J].

Hu, Yuchong ;

Li, Xiaolu ;

Zhang, Mi ;

Lee, Patrick P. C. ;

Zhang, Xiaoyang ;

Zhou, Pan ;

Feng, Dan .

ACM TRANSACTIONS ON STORAGE, 2017, 13 (04)

[7] An Ant Colony Optimization Based Data Update Scheme for Distributed Erasure-Coded Storage Systems [J].

Hu, Yupeng ;

Li, Qian ;

Xie, Wei ;

Ye, Zhenyu .

IEEE ACCESS, 2020, 8 :118696-118706

[8]

Huang Cheng, 2012, P USENIX C ANN TECH, P2

[9] Enabling Efficient and Reliable Transition from Replication to Erasure Coding for Clustered File Systems [J].

Li, Runhui ;

Hu, Yuchong ;

Lee, Patrick P. C. .

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (09) :2500-2513

[10]

Liu He, 2015, P 11 ACM C EM NETW E, P41

← 1 2 3 →