Reinforcement Learning Framework for Server Placement and Workload Allocation in Multiaccess Edge Computing

被引:21
作者
Mazloomi, Anahita [1 ]
Sami, Hani
Bentahar, Jamal [1 ,2 ]
Otrok, Hadi [2 ]
Mourad, Azzam [3 ,4 ]
机构
[1] Concordia Univ, Concordia Inst Informat Syst Engn, Montreal, PQ H3G 1M8, Canada
[2] Khalifa Univ, Ctr Cyber Phys Syst, Dept Elect Engn & Comp Sci, Abu Dhabi, U Arab Emirates
[3] Lebanese Amer Univ Div Sci, Lebanese Amer Univ, Cyber Secur Syst & Appl Res Ctr, Dept CSM, Beirut 10017, Lebanon
[4] New York Univ Abu Dhabi, Div Sci, Abu Dhabi, U Arab Emirates
基金
加拿大自然科学与工程研究理事会;
关键词
Base station allocation; edge server placement; multiaccess edge computing (MEC); Q-learning; reinforcement learning (RL); TD(?);
D O I
10.1109/JIOT.2022.3205051
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
computing is a reliable solution to provide distributed computation power. However, real-time response is still challenging regarding the enormous amount of data generated by the IoT devices in 5G and 6G networks. Thus, multiaccess edge computing (MEC), which consists of distributing the edge servers in the proximity of end users to have low latency besides the higher processing power, is increasingly becoming a vital factor for the success of modern applications. This article addresses the problem of minimizing both, the network delay, which is the main objective of MEC, and the number of edge servers to provide a MEC design with minimum cost. This MEC design consists of edge servers placement and base stations allocation, which makes it a joint combinatorial optimization problem (COP). Recently, reinforcement learning (RL) has shown promising results for COPs. However, modeling real-world problems using RL when the state and action spaces are large still needs investigation. We propose a novel RL framework with an efficient representation and modeling of the state space, action space, and the penalty function in the design of the underlying Markov decision process (MDP) for solving our problem. This modeling makes the temporal difference (TD) learning applicable for a large-scale real-world problem while minimizing the cost of network design. We introduce the TD(lambda) with eligibility traces for minimizing the cost (TDMC) algorithm, in addition to Q-learning for the same problem (QMC) when lambda = 0. Furthermore, we discuss the impact of state representation, action space, and penalty function on the convergence of each model. Extensive experiments using real world data sets from Shanghai Telecommunication and Citywide Public Computer Centers demonstrate that in the light of an efficient model, TDMC/QMC are able to find the actions that are the source of lower delayed penalty. The reported results show that our algorithm outperforms the other benchmarks by creating a tradeoff among multiple objectives.
引用
收藏
页码:1376 / 1390
页数:15
相关论文
共 44 条
  • [1] Addanki R., 2019, Proceedings of the 33rd International Conference on Neural Information Processing Systems (NeurIPS), V32, P3981
  • [2] Internet of Things: A Survey on Enabling Technologies, Protocols, and Applications
    Al-Fuqaha, Ala
    Guizani, Mohsen
    Mohammadi, Mehdi
    Aledhari, Mohammed
    Ayyash, Moussa
    [J]. IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, 2015, 17 (04): : 2347 - 2376
  • [3] Task Offloading and Resource Allocation for Mobile Edge Computing by Deep Reinforcement Learning Based on SARSA
    Alfakih, Taha
    Hassan, Mohammad Mehedi
    Gumaei, Abdu
    Savaglio, Claudio
    Fortino, Giancarlo
    [J]. IEEE ACCESS, 2020, 8 : 54074 - 54084
  • [4] Bahl P, 2012, P 3 ACM WORKSH MOB C, P21, DOI DOI 10.1145/2307849.2307856
  • [5] Boutilier C, 2021, Arxiv, DOI arXiv:1805.02363
  • [6] Exploring Placement of Heterogeneous Edge Servers for Response Time Minimization in Mobile Edge-Cloud Computing
    Cao, Kun
    Li, Liying
    Cui, Yangguang
    Wei, Tongquan
    Hu, Shiyan
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2021, 17 (01) : 494 - 503
  • [7] Chandak Y, 2020, AAAI CONF ARTIF INTE, V34, P3381
  • [8] Reinforcement Learning Meets Wireless Networks: A Layering Perspective
    Chen, Yawen
    Liu, Yu
    Zeng, Ming
    Saleem, Umber
    Lu, Zhaoming
    Wen, Xiangming
    Jin, Depeng
    Han, Zhu
    Jiang, Tao
    Li, Yong
    [J]. IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (01) : 85 - 111
  • [9] Preference-Aware Edge Server Placement in the Internet of Things
    Chen, Yuanyi
    Lin, Yihao
    Zheng, Zengwei
    Yu, Peng
    Shen, Jiaxing
    Guo, Minyi
    [J]. IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (02) : 1289 - 1299
  • [10] Dab Boutheina, 2019, Q LEARNING ALGORITHM