In-network aggregation enabled multiple sub-blocks parallel repair in erasure-coded storage system

被引：0

作者：

Liu, Lei ^{[1
,2
]}

Wang, Yong ^{[1
]}

Liang, Yangfan ^{[3
]}

Chen, Junqi ^{[1
,4
]}

He, Qian ^{[1
,4
]}

机构：

[1] School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin

[2] School of Computer Science and Engineering, Guilin University of Aerospace Technology, Guilin

[3] College of Information Science and Engineering, Jiaxing University, Jiaxing

[4] Guangxi Key Laboratory of Cryptograph and Information Security, Guilin University of Electronic Technology, Guilin

来源：

Computer Networks | 2025年 / 270卷

基金：

中国国家自然科学基金;

关键词：

Distributed storage system; Erasure coding; Fault tolerance; In-network aggregation; Programmable switch;

D O I：

10.1016/j.comnet.2025.111523

中图分类号：

学科分类号：

摘要：

Erasure coding has gained widespread adoption in large-scale distributed storage systems since it can significantly reduce storage overhead while ensuring high reliability. However, repairing failed data in erasure-coded systems requires retrieving data from multiple nodes, which generates substantial network traffic, and often leads to incast congestion and degraded repair performance. Existing solutions alleviate requester-side congestion by offloading aggregation operations to helpers (nodes that provide repair data), but they inevitable increase inter-helper traffic and still struggle to fully utilize global network resources. To this end, we propose lnaPR (In-network Aggregation Enabled Parallel Repair for Multiple Sub-Blocks), a framework that leverages programmable switches to perform in-network aggregation during data repair. InaPR decomposes a data repair task into multiple tree-structured pipelines, enabling data repair to collect source data from more helpers beyond the fixed k-nodes requirement. Then, the bandwidth allocation for each pipeline is optimized through a two-stage methodology: (1) a heuristic helper allocation strategy that assigns high-bandwidth helpers across multiple pipelines while distributing low-capacity ones among distinct pipelines; (2) a throughput-maximizing bandwidth allocation formulated as a linear programming model. Furthermore, we also extend the architecture to cross-rack scenarios through virtual node decomposition. Finally, we prototype lnaPR using a P4-programmable switch and validate its performance in real-world evaluations and multi-rack simulations. Experimental results demonstrate that InaPR achieves 6.74% higher repair throughput than state-of-the-art methods in single-rack prototype tests and an 11.03% improvement in cross-rack simulations. © 2025 Elsevier B.V.

引用

共 51 条

[1]

Rashmi K.V., Shah N.B., Gu D., Kuang H., Borthakur D., Ramchandran K., A solution to the network challenges of data recovery in erasure-coded distributed storage systems: A study on the facebook warehouse cluster, 5th USENIX Workshop on Hot Topics in Storage and File Systems, HotStorage’13, San Jose, CA, USA, June 27-28, 2013, (2013)

[2]

Gill P., Jain N., Nagappan N., Understanding network failures in data centers: measurement, analysis, and implications, Proceedings of the ACM SIGCOMM 2011 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, Toronto, on, Canada, August 15-19, 2011, pp. 350-361, (2011)

[3]

Jagadish H.V., Gehrke J., Labrinidis A., Papakonstantinou Y., Patel J.M., Ramakrishnan R., Shahabi C., Big data and its technical challenges, Commun. ACM, 57, 7, pp. 86-94, (2014)

[4]

Mathew R., Paul J., Jamadagni R., Shruthii R., Malagi V., A Comprehensive Study on Hardware and Software Based Accident Detection Systems, (2022)

[5]

Ovsiannikov M., Rus S., Reeves D., Sutter P., Rao S., Kelly J., A the quantcast file system, Proc. VLDB Endow., 6, 11, pp. 1092-1101, (2013)

[6]

Shvachko K., Kuang H., Radia S., Chansler R., The hadoop distributed file system, IEEE 26th Symposium on Mass Storage Systems and Technologies, MSST 2012, Lake Tahoe, Nevada, USA, May 3-7, 2010, pp. 1-10, (2010)

[7]

Weil S.A., Brandt S.A., Miller E.L., Long D.D.E., Maltzahn C., Ceph: A scalable, high-performance distributed file system, 7th Symposium on Operating Systems Design and Implementation (OSDI ’06), November 6-8, Seattle, WA, USA, pp. 307-320, (2006)

[8]

Mitra S., Panta R.K., Ra M., Bagchi S., Partial-parallel-repair (PPR): a distributed technique for repairing erasure coded storage, Proceedings of the Eleventh European Conference on Computer Systems, EuroSys 2016, London, United Kingdom, April 18-21, 2016, pp. 301-30:16, (2016)

[9]

Li X., Yang Z., Li J., Li R., Lee P.P.C., Huang Q., Hu Y., Repair pipelining for erasure-coded storage: Algorithms and evaluation, ACM Trans. Storage, 17, 2, pp. 131-13:29, (2021)

[10]

Bai Y., Xu Z., Wang H., Wang D., Fast recovery techniques for erasure-coded clusters in non-uniform traffic network, Proceedings of the 48th International Conference on Parallel Processing, ICPP 2019, Kyoto, Japan, August 05-08, 2019, pp. 611-61:10, (2019)

← 1 2 3 4 5 6 →