Load Balancing in Compute Clusters With Delayed Feedback

被引:0
作者
Tahir, Anam [1 ]
Alt, Bastian [1 ]
Rizk, Amr [2 ]
Koeppl, Heinz [1 ]
机构
[1] Tech Univ Darmstadt, Dept Elect Engn & Informat Technol, D-64289 Darmstadt, Germany
[2] Univ Duisburg Essen, Commun Networks & Syst Lab, D-47057 Duisburg, Germany
关键词
Servers; Load modeling; Load management; Decision making; Time factors; Markov processes; Delays; Parallel systems; load balancing; partial observability; NETWORKS; QUEUE; GO;
D O I
10.1109/TC.2022.3215907
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Load balancing arises as a fundamental problem, underlying the dimensioning and operation of many computing and communication systems, such as job routing in data center clusters, multipath communication, Big Data and queueing systems. In essence, the decision-making agent maps each arriving job to one of the possibly heterogeneous servers while aiming at an optimization goal such as load balancing, low average delay or low loss rate. One main difficulty in finding optimal load balancing policies here is that the agent only partially observes the impact of its decisions, e.g., through the delayed acknowledgements of the served jobs. In this paper, we provide a partially observable (PO) model that captures the load balancing decisions in parallel buffered systems under limited information of delayed acknowledgements. We present a simulation model for this PO system to find a load balancing policy in real-time using a scalable Monte Carlo tree search algorithm. We numerically show that the resulting policy outperforms other limited information load balancing strategies such as variants of Join-the-Most-Observations and has comparable performance to full information strategies like: Join-the-Shortest-Queue, Join-the-Shortest-Queue(d) and Shortest-Expected-Delay. Finally, we show that our approach can optimise the real-time parallel processing by using network data provided by Kaggle.
引用
收藏
页码:1610 / 1622
页数:13
相关论文
共 42 条
  • [1] Transitions: A Protocol-Independent View of the Future Internet
    Alt, Bastian
    Weckesser, Markus
    Becker, Christian
    Hollick, Matthias
    Kar, Sounak
    Klein, Anja
    Klose, Robin
    Kluge, Roland
    Konppl, Heinz
    Koldehofe, Boris
    Khudabukhsh, Wasiur R.
    Luthra, Manisha
    Mousavi, Mahdi
    Muehlhaeuser, Max
    Pfannemueller, Martin
    Rizk, Amr
    Schuerr, Andy
    Steinmetz, Ralf
    [J]. PROCEEDINGS OF THE IEEE, 2019, 107 (04) : 835 - 846
  • [2] Altman E., 1992, Performance Evaluation Review, V20, P193, DOI 10.1145/149439.133106
  • [3] Exploiting Queuing Networks to Model and Assess the Performance of Self-Adaptive Software Systems: A Survey
    Arcelli, Davide
    [J]. 11TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT) / THE 3RD INTERNATIONAL CONFERENCE ON EMERGING DATA AND INDUSTRY 4.0 (EDI40) / AFFILIATED WORKSHOPS, 2020, 170 : 498 - 505
  • [4] Armbrust M., 2009, Rep. UCB/EECS-2009-28
  • [5] ARTIGES D, 1993, PROCEEDINGS OF THE 32ND IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-4, P2737, DOI 10.1109/CDC.1993.325693
  • [6] Reinforcement learning in queues
    Ayesta, U.
    [J]. QUEUEING SYSTEMS, 2022, 100 (3-4) : 497 - 499
  • [7] Banawan S. A., 1989, IEEE INFOCOM'89 The Conference on Computer Communications. Proceedings of the Eighth Annual Joint Conference of the IEEE Computer and Communications Societies. Technology: Emerging or Converging? (IEEE Cat. No. 89CH2702-9), P731, DOI 10.1109/INFCOM.1989.101521
  • [8] BANAWAN SA, 1992, PROC ANNU SIMUL SYMP, P22, DOI 10.1109/SIMSYM.1992.227580
  • [9] DYNAMIC PROGRAMMING
    BELLMAN, R
    [J]. SCIENCE, 1966, 153 (3731) : 34 - &
  • [10] A primer on partially observable Markov decision processes (POMDPs)
    Chades, Iadine
    Pascal, Luz V.
    Nicol, Sam
    Fletcher, Cameron S.
    Ferrer-Mestres, Jonathan
    [J]. METHODS IN ECOLOGY AND EVOLUTION, 2021, 12 (11): : 2058 - 2072