Resilient Datacenter Load Balancing in the Wild

被引:169
|
作者
Zhang, Hong [1 ]
Zhang, Junxue [1 ]
Bai, Wei [1 ]
Chen, Kai [1 ]
Chowdhury, Mosharaf [2 ]
机构
[1] Hong Kong Univ Sci & Technol, SING Lab, Hong Kong, Hong Kong, Peoples R China
[2] Univ Michigan, Ann Arbor, MI 48109 USA
来源
SIGCOMM '17: PROCEEDINGS OF THE 2017 CONFERENCE OF THE ACM SPECIAL INTEREST GROUP ON DATA COMMUNICATION | 2017年
关键词
Datacenter fabric; Load balancing; Distributed;
D O I
10.1145/3098822.3098841
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Production datacenters operate under various uncertainties such as traffic dynamics, topology asymmetry, and failures. Therefore, datacenter load balancing schemes must be resilient to these uncertainties; i.e., they should accurately sense path conditions and timely react to mitigate the fallouts. Despite significant efforts, prior solutions have important drawbacks. On the one hand, solutions such as Presto and DRB are oblivious to path conditions and blindly reroute at fixed granularity. On the other hand, solutions such as CONGA and CLOVE can sense congestion, but they can only reroute when flowlets emerge; thus, they cannot always react timely to uncertainties. To make things worse, these solutions fail to detect/handle failures such as blackholes and random packet drops, which greatly degrades their performance. In this paper, we introduce Hermes, a datacenter load balancer that is resilient to the aforementioned uncertainties. At its heart, Hermes leverages comprehensive sensing to detect path conditions including failures unattended before, and it reacts using timely yet cautious rerouting. Hermes is a practical edge-based solution with no switch modification. We have implemented Hermes with commodity switches and evaluated it through both testbed experiments and large-scale simulations. Our results show that Hermes achieves comparable performance to CONGA and Presto in normal cases, and well handles uncertainties: under asymmetries, Hermes achieves up to 10% and 20% better flow completion time ( FCT) than CONGA and CLOVE; under switch failures, it outperforms all other schemes by over 32%.
引用
收藏
页码:253 / 266
页数:14
相关论文
共 50 条
  • [1] Load balancing for heterogeneous traffic in datacenter networks
    Wang, Jin
    Rao, Shuying
    Liu, Ying
    Sharma, Pradip Kumar
    Hu, Jinbin
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2023, 217
  • [2] Stateless Datacenter Load-balancing with Beamer
    Olteanu, Vladimir
    Agache, Alexandru
    Voinescu, Andrei
    Raiciu, Costin
    PROCEEDINGS OF THE 15TH USENIX SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION (NSDI'18), 2018, : 125 - 139
  • [3] Load Balancing Based on Flow Classification for Datacenter Network
    Cui Z.-X.
    Hu Y.-X.
    Lan J.-L.
    Wang Y.
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2021, 49 (03): : 559 - 565
  • [4] Flow distribution-aware load balancing for the datacenter
    Wang, Shuo
    Zhang, Jiao
    Huang, Tao
    Pan, Tian
    Liu, Jiang
    Liu, Yunjie
    COMPUTER COMMUNICATIONS, 2017, 106 : 136 - 146
  • [5] On Datacenter-Network-Aware Load Balancing in MapReduce
    Le, Yanfang
    Wang, Feng
    Liu, Jiangchuan
    Ergun, Funda
    2015 IEEE 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, 2015, : 485 - 492
  • [6] Lyapunov Stability Analysis of Load Balancing in Datacenter Networks
    Dhananjayan, Amrith
    Seow, Kiam Tian
    Foh, Chuan Heng
    2013 IEEE GLOBECOM WORKSHOPS (GC WKSHPS), 2013, : 912 - 916
  • [7] Topology-Aware Load Balancing in Datacenter Networks
    Khan, Tahir Abbas
    Khan, Muhammad Saeed
    Abbas, Sagheer
    Janjua, Jamshaid Iqbal
    Muhammad, Syed Shah
    Asif, Muhammad
    2021 IEEE ASIA PACIFIC CONFERENCE ON WIRELESS AND MOBILE (APWIMOB), 2021, : 220 - 225
  • [8] Efficient load balancing over asymmetric datacenter topologies
    Irteze, Syed Mohammad
    Bashir, Hafiz Mohsin
    Anwar, Talal
    Qazi, Ihsan Ayyub
    Dogar, Fahad Rafique
    COMPUTER COMMUNICATIONS, 2018, 127 : 1 - 12
  • [9] BULB: Lightweight and Automated Load Balancing for Fast Datacenter Networks
    Liu, Yuan
    Li, Wenxin
    Qu, Wenyu
    Qi, Heng
    51ST INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2022, 2022,
  • [10] Towards Coordinated Congestion Control and Load Balancing in Datacenter Networks
    Zhao, Zhengwei
    Jiang, Zhixiong
    Lu, Chunyang
    Cai, Yushan
    Bi, Jingping
    2013 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2013, : 1285 - 1290