KPAMA: A Kubernetes based tool for Mitigating ML system Aging

被引:0
作者
Ding, Wenjie [1 ]
Liu, Zhihao [1 ]
Lu, Xuhui [1 ]
Du, Xiaoting [2 ]
Zheng, Zheng [1 ]
机构
[1] Beihang Univ, Sch Automat Sci & Elect Engn, Beijing 100091, Peoples R China
[2] Beijing Univ Posts & Telecommun, Sch Comp Sci, Beijing 100083, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Kubernetes-based machine learning system; Software aging; Data prediction; Autoscaling; CLOUD;
D O I
10.1016/j.jss.2025.112389
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
As machine learning (ML) systems continue to evolve and be applied, their user base and system size also expand. This expansion is particularly evident with the widespread adoption of large language models. Currently, the infrastructure supporting ML systems, such as cloud services and computing hardware, which are increasingly becoming foundational to the ML system environment, is increasingly adopted to support continuous training and inference services. Nevertheless, it has been shown that the increased data volume, complexity of computations, and extended run times challenge the stability of ML systems, efficiency, and availability, precipitating system aging. To address this issue, we develop a novel solution, KPAMA, leveraging Kubernetes, the leading container orchestration platform, to enhance the autoscaling of computing workflows and resources, effectively mitigating system aging. KPAMA employs a hybrid model to predict key aging metrics and uses decision and anti-oscillation algorithms to achieve system resource autoscaling. Our experiments indicate that KPAMA markedly mitigates system aging and enhances task reliability compared to the standard Horizontal Pod Autoscaler and systems without scaling capabilities.
引用
收藏
页数:13
相关论文
共 58 条
[1]  
SLA of aliyun, (2024)
[2]  
Andrade E., Machida F., Pietrantuono R., Cotroneo D., Software aging in image classification systems on cloud and edge, 2020 IEEE International Symposium on Software Reliability Engineering Workshops, ISSREW, pp. 342-348, (2020)
[3]  
Andrade E., Pietrantuono R., Machida F., Cotroneo D., A comparative analysis of software aging in image classifiers on cloud and edge, IEEE Trans. Dependable Secur. Comput., 20, 1, pp. 563-573, (2021)
[4]  
Ariyo A.A., Adewumi A.O., Ayo C.K., Stock price prediction using the ARIMA model, 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation, pp. 106-112, (2014)
[5]  
Augustyn D.R., Wycislik L., Sojka M., Tuning a Kubernetes horizontal pod autoscaler for meeting performance and load demands in cloud deployments, Appl. Sci., 14, 2, (2024)
[6]  
Avresky D.R., Di Sanzo P., Pellegrini A., Ciciani B., Forte L., Proactive scalability and management of resources in hybrid clouds via machine learning, 2015 IEEE 14th International Symposium on Network Computing and Applications, pp. 114-119, (2015)
[7]  
Balla D., Simon C., Maliosz M., Adaptive scaling of Kubernetes pods, NOMS 2020-2020 IEEE/IFIP Network Operations and Management Symposium, pp. 1-5, (2020)
[8]  
Berggren K., Xia Q., Likharev K.K., Strukov D.B., Jiang H., Mikolajick T., Querlioz D., Salinga M., Erickson J.R., Pi S., Et al., Roadmap on emerging hardware and technology for machine learning, Nanotechnology, 32, 1, (2020)
[9]  
Burns B., Beda J., Hightower K., Evenson L., Kubernetes: Up and Running, (2022)
[10]  
Cotroneo D., Fucci F., Iannillo A., Natella R., Pietrantuono R., pp. 478-489, (2016)