In-network aggregation enabled multiple sub-blocks parallel repair in erasure-coded storage system

被引:0
作者
Liu, Lei [1 ,2 ]
Wang, Yong [1 ]
Liang, Yangfan [3 ]
Chen, Junqi [1 ,4 ]
He, Qian [1 ,4 ]
机构
[1] School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin
[2] School of Computer Science and Engineering, Guilin University of Aerospace Technology, Guilin
[3] College of Information Science and Engineering, Jiaxing University, Jiaxing
[4] Guangxi Key Laboratory of Cryptograph and Information Security, Guilin University of Electronic Technology, Guilin
基金
中国国家自然科学基金;
关键词
Distributed storage system; Erasure coding; Fault tolerance; In-network aggregation; Programmable switch;
D O I
10.1016/j.comnet.2025.111523
中图分类号
学科分类号
摘要
Erasure coding has gained widespread adoption in large-scale distributed storage systems since it can significantly reduce storage overhead while ensuring high reliability. However, repairing failed data in erasure-coded systems requires retrieving data from multiple nodes, which generates substantial network traffic, and often leads to incast congestion and degraded repair performance. Existing solutions alleviate requester-side congestion by offloading aggregation operations to helpers (nodes that provide repair data), but they inevitable increase inter-helper traffic and still struggle to fully utilize global network resources. To this end, we propose lnaPR (In-network Aggregation Enabled Parallel Repair for Multiple Sub-Blocks), a framework that leverages programmable switches to perform in-network aggregation during data repair. InaPR decomposes a data repair task into multiple tree-structured pipelines, enabling data repair to collect source data from more helpers beyond the fixed k-nodes requirement. Then, the bandwidth allocation for each pipeline is optimized through a two-stage methodology: (1) a heuristic helper allocation strategy that assigns high-bandwidth helpers across multiple pipelines while distributing low-capacity ones among distinct pipelines; (2) a throughput-maximizing bandwidth allocation formulated as a linear programming model. Furthermore, we also extend the architecture to cross-rack scenarios through virtual node decomposition. Finally, we prototype lnaPR using a P4-programmable switch and validate its performance in real-world evaluations and multi-rack simulations. Experimental results demonstrate that InaPR achieves 6.74% higher repair throughput than state-of-the-art methods in single-rack prototype tests and an 11.03% improvement in cross-rack simulations. © 2025 Elsevier B.V.
引用
收藏
相关论文
共 51 条
[1]  
Rashmi K.V., Shah N.B., Gu D., Kuang H., Borthakur D., Ramchandran K., A solution to the network challenges of data recovery in erasure-coded distributed storage systems: A study on the facebook warehouse cluster, 5th USENIX Workshop on Hot Topics in Storage and File Systems, HotStorage’13, San Jose, CA, USA, June 27-28, 2013, (2013)
[2]  
Gill P., Jain N., Nagappan N., Understanding network failures in data centers: measurement, analysis, and implications, Proceedings of the ACM SIGCOMM 2011 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, Toronto, on, Canada, August 15-19, 2011, pp. 350-361, (2011)
[3]  
Jagadish H.V., Gehrke J., Labrinidis A., Papakonstantinou Y., Patel J.M., Ramakrishnan R., Shahabi C., Big data and its technical challenges, Commun. ACM, 57, 7, pp. 86-94, (2014)
[4]  
Mathew R., Paul J., Jamadagni R., Shruthii R., Malagi V., A Comprehensive Study on Hardware and Software Based Accident Detection Systems, (2022)
[5]  
Ovsiannikov M., Rus S., Reeves D., Sutter P., Rao S., Kelly J., A the quantcast file system, Proc. VLDB Endow., 6, 11, pp. 1092-1101, (2013)
[6]  
Shvachko K., Kuang H., Radia S., Chansler R., The hadoop distributed file system, IEEE 26th Symposium on Mass Storage Systems and Technologies, MSST 2012, Lake Tahoe, Nevada, USA, May 3-7, 2010, pp. 1-10, (2010)
[7]  
Weil S.A., Brandt S.A., Miller E.L., Long D.D.E., Maltzahn C., Ceph: A scalable, high-performance distributed file system, 7th Symposium on Operating Systems Design and Implementation (OSDI ’06), November 6-8, Seattle, WA, USA, pp. 307-320, (2006)
[8]  
Mitra S., Panta R.K., Ra M., Bagchi S., Partial-parallel-repair (PPR): a distributed technique for repairing erasure coded storage, Proceedings of the Eleventh European Conference on Computer Systems, EuroSys 2016, London, United Kingdom, April 18-21, 2016, pp. 301-30:16, (2016)
[9]  
Li X., Yang Z., Li J., Li R., Lee P.P.C., Huang Q., Hu Y., Repair pipelining for erasure-coded storage: Algorithms and evaluation, ACM Trans. Storage, 17, 2, pp. 131-13:29, (2021)
[10]  
Bai Y., Xu Z., Wang H., Wang D., Fast recovery techniques for erasure-coded clusters in non-uniform traffic network, Proceedings of the 48th International Conference on Parallel Processing, ICPP 2019, Kyoto, Japan, August 05-08, 2019, pp. 611-61:10, (2019)