Turbo: Efficient Communication Framework for Large-scale Data Processing Cluster

被引：0

作者：

Jia, Xuya ^{[1
]}

Yao, Zhiyi ^{[1
,2
]}

Peng, Chao ^{[1
,2
]}

Zhao, Zihao ^{[3
]}

Lei, Bin ^{[3
]}

Liu, Edison ^{[3
]}

Li, Xiang ^{[1
]}

He, Zekun ^{[1
]}

Wang, Yachen ^{[1
]}

Zou, Xianneng ^{[1
]}

Zhao, Chongqing ^{[1
]}

Chu, Jinhui ^{[1
]}

Wang, Jilong ^{[4
]}

Miao, Congcong ^{[1
]}

机构：

[1] Tencent, Shenzhen, Peoples R China

[2] Fudan Univ, Shanghai, Peoples R China

[3] NVIDIA, Santa Clara, CA USA

[4] Tsinghua Univ, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 2024 ACM SIGCOMM 2024 CONFERENCE, ACM SIGCOMM 2024 | 2024年

关键词：

Remote Direct Memory Access; Load balance; Communication middleware; Reliability; RDMA;

D O I：

10.1145/3651890.3672241

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Big data processing clusters are suffering from a long job completion time due to the inefficient utilization of the RDMA capability. Our production measurement results in a large-scale cluster with hundreds of server nodes to process large-scale jobs have shown that the existing deployment of RDMA technique results in a long-tail job completion time, with some jobs even taking up more than twice the average time to complete. In this paper, we present the design and implementation of Turbo, an efficient communication framework for the large-scale data processing cluster to achieve high performance and scalability. The core of Turbo's approach is to leverage a dynamic block-level flowlet transmission mechanism and a non-blocking communication middleware to improve the network throughput and enhance system's scalability. Furthermore, Turbo ensures high system reliability by utilizing an external shuffle service as well as TCP serving as a backup. We integrate Turbo into Apache Spark and evaluate Turbo in a small-scale testbed and a large-scale cluster consisting of hundreds of server nodes. The small-scale testbed evaluation results show that Turbo improves the network throughput by 15.1% while maintaining high system reliability. The large-scale production results have shown Turbo can reduce the job completion time by 23.9% and increase the job completion rate by 2.03x over the existing RDMA solutions.

引用

页码：540 / 553

页数：14

共 61 条

[1] Aguilera MK, 2018, PROCEEDINGS OF THE 2018 USENIX ANNUAL TECHNICAL CONFERENCE, P775
[2] Al-Fares M., 2010, P 7 USENIX C NETWORK, P19, DOI DOI 10.5555/1855711.1855730
[3] CONGA: Distributed Congestion-Aware Load Balancing for Datacenters
Alizadeh, Mohammad
Edsall, Tom
Dharmapurikar, Sarang
Vaidyanathan, Ramanan
Chu, Kevin
Fingerhut, Andy
Lam, Vinh The
Matus, Francis
Pan, Rong
Yadav, Navindra
Varghese, George
[J]. SIGCOMM'14: PROCEEDINGS OF THE 2014 ACM CONFERENCE ON SPECIAL INTEREST GROUP ON DATA COMMUNICATION, 2014, : 503 - 514
[4] Barroso L. A., 2019, The datacenter as a computer: Designing warehouse-scale machines
[5] Carbone P., 2015, IEEE Data Eng. Bull., V38, P28, DOI DOI 10.1109/IC2EW.2016.56
[6] Verifying concurrent, crash-safe systems with Perennial
Chajed, Tej
Tassarotti, Joseph
Kaashoek, M. Frans
Zeldovich, Nickolai
[J]. PROCEEDINGS OF THE TWENTY-SEVENTH ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES (SOSP '19), 2019, : 243 - 258
[7] MP-RDMA: Enabling RDMA With Multi-Path Transport in Datacenters
Chen, Guo
Lu, Yuanwei
Li, Bojie
Tan, Kun
Xiong, Yongqiang
Cheng, Peng
Zhang, Jiansong
Moscibroda, Thomas
[J]. IEEE-ACM TRANSACTIONS ON NETWORKING, 2019, 27 (06) : 2308 - 2323
[8] OPS: Optimized Shuffle Management System for Apache Spark
Cheng, Yuchen
Wu, Chunghsuan
Liu, Yanqiang
Ren, Rui
Xu, Hong
Yang, Bin
Qi, Zhengwei
[J]. PROCEEDINGS OF THE 49TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2020, 2020,
[9] Davidson A., 2013, OPTIMIZING SHUFFLE P
[10] Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137

← 1 2 3 4 5 6 7 →