A deep multi-agent reinforcement learning approach for the micro-service migration problem with affinity in the cloud

被引:0
作者
Ma, Ning [1 ]
Tang, Angjun [1 ]
Xiong, Zifeng [2 ]
Jiang, Fuxin [3 ]
机构
[1] Xi An Jiao Tong Univ, Sch Publ Policy & Adm, Xian 710049, Peoples R China
[2] Xi An Jiao Tong Univ, Sch Software Engn, Xian 710049, Peoples R China
[3] ByteDance Inc, Beijing 100098, Peoples R China
关键词
Micro-service; Migration; Invoking traffic; Resource optimization; Deep reinforcement learning; VIRTUAL MACHINES; LIVE MIGRATION;
D O I
10.1016/j.eswa.2025.126856
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper focuses on the micro-service migration problem with affinity, stemming from the cloud computing industry. Because of periodically creating and deleting micro-services to satisfy users' demands, the deployment of micro-services in the cloud needs to be regularly adjusted, which is referred to as a micro-service migration. An optimal migration schedule should minimize the number of activated physical machines as well as maximize total internal invoking traffic (affinity). A cooperative multi-agent reinforcement learning (MARL) is proposed, which is enhanced by integrating Hindsight Reward Shaping and by fine-tuning the state encoder using a pre-trained ResNet model. The proposed MARL is validated on both synthetic datasets and real cloud traces of ByteDance and Alibaba, compared with four baseline algorithms: Migration Ant Colony Optimization, Migration Neighborhood Search, Single-Agent Reinforcement Learning, and the optimization solver CPLEX. Finally, an evaluation mechanism called Matching Score is proposed to explain the superior performance of MARL.
引用
收藏
页数:19
相关论文
共 48 条
  • [1] Badraa T., Kinoshita K., An energy efficient non-live virtual machine migration, 2018 IEEE 7th international conference on cloud networking, pp. 1-3, (2018)
  • [2] Basu D., Wang X., Hong Y., Chen H., Bressan S., Learn-as-you-go with megh: Efficient live migration of virtual machines, IEEE Transactions on Parallel and Distributed Systems, 30, 8, pp. 1786-1801, (2019)
  • [3] Belgacem A., Mahmoudi S., Ferrag M.A., A machine learning model for improving virtual machine migration in cloud computing, Journal of Supercomputing, 79, 9, pp. 9486-9508, (2023)
  • [4] Bi W., Ma J., Zhu X., Wang W., Zhang A., Cloud service selection based on weighted KD tree nearest neighbor search, Applied Soft Computing, 131, (2022)
  • [5] Biswas N.K., Banerjee S., Biswas U., Ghosh U., An approach towards development of new linear regression prediction model for reduced energy consumption and SLA violation in the domain of green cloud computing, Sustainable Energy Technologies and Assessments, 45, (2021)
  • [6] Bukhari M.I.A., Luksch P., Malekpour A., Reducing virtual machine migration delay using SCTP, 2018 international conference on computational science and computational intelligence, pp. 1340-1344, (2018)
  • [7] Cerroni W., Esposito F., Optimizing live migration of multiple virtual machines, IEEE Transactions on Cloud Computing, 6, 4, pp. 1096-1109, (2016)
  • [8] Charles P.J., Stanislaus U.L., Secure virtual machine migration using ant colony algorithm, 2021 fifth international conference on I-SMAC (IoT in social, mobile, analytics and cloud), pp. 1571-1575, (2021)
  • [9] Chen X., Bi Y., Chen X., Zhao H., Cheng N., Li F., Et al., Dynamic service migration and request routing for microservice in multicell mobile-edge computing, IEEE Internet of Things Journal, 9, 15, pp. 13126-13143, (2022)
  • [10] Chen Z., Jiang F., Chen B., Li Y., Zhang Y., Huang C., Et al., Resource allocation with service affinity in large-scale cloud environments, 2024 IEEE 40th international conference on data engineering, pp. 5280-5293, (2024)