Load Balancing in Compute Clusters With Delayed Feedback

被引:0
作者
Tahir, Anam [1 ]
Alt, Bastian [1 ]
Rizk, Amr [2 ]
Koeppl, Heinz [1 ]
机构
[1] Tech Univ Darmstadt, Dept Elect Engn & Informat Technol, D-64289 Darmstadt, Germany
[2] Univ Duisburg Essen, Commun Networks & Syst Lab, D-47057 Duisburg, Germany
关键词
Servers; Load modeling; Load management; Decision making; Time factors; Markov processes; Delays; Parallel systems; load balancing; partial observability; NETWORKS; QUEUE; GO;
D O I
10.1109/TC.2022.3215907
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Load balancing arises as a fundamental problem, underlying the dimensioning and operation of many computing and communication systems, such as job routing in data center clusters, multipath communication, Big Data and queueing systems. In essence, the decision-making agent maps each arriving job to one of the possibly heterogeneous servers while aiming at an optimization goal such as load balancing, low average delay or low loss rate. One main difficulty in finding optimal load balancing policies here is that the agent only partially observes the impact of its decisions, e.g., through the delayed acknowledgements of the served jobs. In this paper, we provide a partially observable (PO) model that captures the load balancing decisions in parallel buffered systems under limited information of delayed acknowledgements. We present a simulation model for this PO system to find a load balancing policy in real-time using a scalable Monte Carlo tree search algorithm. We numerically show that the resulting policy outperforms other limited information load balancing strategies such as variants of Join-the-Most-Observations and has comparable performance to full information strategies like: Join-the-Shortest-Queue, Join-the-Shortest-Queue(d) and Shortest-Expected-Delay. Finally, we show that our approach can optimise the real-time parallel processing by using network data provided by Kaggle.
引用
收藏
页码:1610 / 1622
页数:13
相关论文
共 42 条
  • [21] Liu B, 2019, ANN ALLERTON CONF, P663, DOI [10.1109/allerton.2019.8919665, 10.1109/ALLERTON.2019.8919665]
  • [22] The power of two choices in randomized load balancing
    Mitzenmacher, M
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2001, 12 (10) : 1094 - 1104
  • [23] Vien NA, 2015, AAAI CONF ARTIF INTE, P3613
  • [24] Anytime point-based approximations for large POMDPs
    Pineau, Joelle
    Gordon, Geoffrey
    Thrun, Sebastian
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2006, 27 : 335 - 380
  • [25] A comprehensive view of Hadoop research-A systematic literature review
    Polato, Ivanilton
    Re, Reginaldo
    Goldman, Alfredo
    Kon, Fabio
    [J]. JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2014, 46 : 1 - 25
  • [26] Efficient randomized web-cache replacement schemes using samples from past eviction times
    Psounis, K
    Prabhakar, B
    [J]. IEEE-ACM TRANSACTIONS ON NETWORKING, 2002, 10 (04) : 441 - 454
  • [27] Rojas J. S., LABELED NETWORK TRAF
  • [28] Ross S.M., 2014, INTRO PROBABILITY MO
  • [29] Online planning algorithms for POMDPs
    Ross, Stephane
    Pineau, Joelle
    Paquet, Sebastien
    Chaib-draa, Brahim
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2008, 32 : 663 - 704
  • [30] Probabilistic programming in Python']Python using PyMC3
    Salvatier, John
    Wiecki, Thomas, V
    Fonnesbeck, Christopher
    [J]. PEERJ COMPUTER SCIENCE, 2016,