Demystifying deep learning in predictive monitoring for cloud-native SLOs

被引:1
|
作者
Morichetta, Andrea [1 ]
Pujol, Victor Casamayor [1 ]
Nastic, Stefan [1 ]
Pusztai, Thomas [1 ]
Raith, Philipp [1 ]
Dustdar, Schahram [1 ]
Vij, Deepak [2 ]
Xiong, Ying [2 ]
Zhang, Zhaobo [2 ]
机构
[1] TU Wien, Distributed Syst Grp, Vienna, Austria
[2] Futurewei Technol Inc, Santa Clara, CA USA
关键词
workload prediction; neural networks; cloud; LSTM; Transformers; HOST LOAD PREDICTION; WORKLOAD; MODEL;
D O I
10.1109/CLOUD60044.2023.00013
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The complexity inherent in managing cloud computing systems calls for novel solutions that can effectively enforce high-level Service Level Objectives (SLOs) promptly. Unfortunately, most of the current SLO management solutions rely on reactive approaches, i.e., correcting SLO violations only after they have occurred. Further, the few methods that explore predictive techniques to prevent SLO violations focus solely on forecasting low-level system metrics, such as CPU and Memory utilization. Although valid in some cases, these metrics do not necessarily provide clear and actionable insights into application behavior. This paper presents a novel approach that directly predicts high-level SLOs using low-level system metrics. We target this goal by training and optimizing two state-of-the-art neural network models, a Short-Term Long Memory LSTM-, and a Transformer-based model. Our models provide actionable insights into application behavior by establishing proper connections between the evolution of low-level workload-related metrics and the high-level SLOs. We demonstrate our approach to selecting and preparing the data. We show in practice how to optimize LSTM and Transformer by targeting efficiency as a high-level SLO metric and performing a comparative analysis. We show how these models behave when the input workloads come from different distributions. Consequently, we demonstrate their ability to generalize in heterogeneous systems. Finally, we operationalize our two models by integrating them into the Polaris framework we have been developing to enable a performance-driven SLO-native approach to Cloud computing.
引用
收藏
页码:24 / 34
页数:11
相关论文
共 50 条
  • [1] A Novel Middleware for Efficiently Implementing Complex Cloud-Native SLOs
    Pusztai, Thomas
    Morichetta, Andrea
    Pujol, Victor Casamayor
    Dustdar, Schahram
    Nastic, Stefan
    Ding, Xiaoning
    Vij, Deepak
    Xiong, Ying
    2021 IEEE 14TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD 2021), 2021, : 410 - 420
  • [2] Monitoring solution for cloud-native DevSecOps
    Sojan, Arun
    Rajan, Ranjit
    Kuvaja, Pasi
    2021 IEEE 6TH INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD 2021), 2021, : 125 - 131
  • [3] An Enhanced Cloud-Native Deep Learning Pipeline for Network Traffic Classification
    ElKenawy, Ahmed S.
    Aly, Sherif G.
    PROCEEDINGS OF THE 2022 IEEE 11TH INTERNATIONAL CONFERENCE ON CLOUD NETWORKING (IEEE CLOUDNET 2022), 2022, : 136 - 140
  • [4] Maintaining SLOs of Cloud-native Applications via Self-Adaptive Resource Sharing
    Podolskiy, Vladimir
    Mayo, Michael
    Koay, Abigail
    Gerndt, Michael
    Patros, Panos
    2019 IEEE 13TH INTERNATIONAL CONFERENCE ON SELF-ADAPTIVE AND SELF-ORGANIZING SYSTEMS (SASO), 2019, : 72 - 81
  • [5] Predictive Autoscaling Orchestration for Cloud-native Telecom Microservices
    Duc-Hung Luong
    Huu-Trung Thieu
    Outtagarts, Abdelkader
    Ghamri-Doudane, Yacine
    2018 IEEE 5G WORLD FORUM (5GWF), 2018, : 153 - 158
  • [6] SLO Script: A Novel Language for Implementing Complex Cloud-Native Elasticity-Driven SLOs
    Pusztai, Thomas
    Morichetta, Andrea
    Pujol, Victor Casamayor
    Dustdar, Schahram
    Nastic, Stefan
    Ding, Xiaoning
    Vij, Deepak
    Xiong, Ying
    2021 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES, ICWS 2021, 2021, : 21 - 31
  • [7] Dataset Placement and Data Loading Optimizations for Cloud-Native Deep Learning Workloads
    Kang, Zhuangwei
    Min, Ziran
    Zhou, Shuang
    Barve, Yogesh D.
    Gokhale, Aniruddha
    2023 IEEE 26TH INTERNATIONAL SYMPOSIUM ON REAL-TIME DISTRIBUTED COMPUTING, ISORC, 2023, : 107 - 116
  • [8] Fluid: Dataset Abstraction and Elastic Acceleration for Cloud-native Deep Learning Training Jobs
    Gu, Rong
    Zhang, Kai
    Xu, Zhihao
    Che, Yang
    Fan, Bin
    Hou, Haojun
    Dai, Haipeng
    Yi, Li
    Ding, Yu
    Chen, Guihai
    Huang, Yihua
    2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 2182 - 2195
  • [9] Predictive Container Auto-Scaling for Cloud-Native Applications
    Zhao, Hanqing
    Lim, Hyunwoo
    Hanif, Muhammad
    Lee, Choonhwa
    2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC): ICT CONVERGENCE LEADING THE AUTONOMOUS FUTURE, 2019, : 1280 - 1282
  • [10] Zero Overhead Monitoring for Cloud-native Infrastructure using RDMA
    Wang, Zhe
    Ma, Teng
    Kong, Linghe
    Wen, Zhenzao
    Li, Jingxuan
    Song, Zhuo
    Lu, Yang
    Yang, Yong
    Ma, Tao
    Chen, Guihai
    Cao, Wei
    PROCEEDINGS OF THE 2022 USENIX ANNUAL TECHNICAL CONFERENCE, 2022, : 639 - 654