Demystifying deep learning in predictive monitoring for cloud-native SLOs

被引:1
|
作者
Morichetta, Andrea [1 ]
Pujol, Victor Casamayor [1 ]
Nastic, Stefan [1 ]
Pusztai, Thomas [1 ]
Raith, Philipp [1 ]
Dustdar, Schahram [1 ]
Vij, Deepak [2 ]
Xiong, Ying [2 ]
Zhang, Zhaobo [2 ]
机构
[1] TU Wien, Distributed Syst Grp, Vienna, Austria
[2] Futurewei Technol Inc, Santa Clara, CA USA
关键词
workload prediction; neural networks; cloud; LSTM; Transformers; HOST LOAD PREDICTION; WORKLOAD; MODEL;
D O I
10.1109/CLOUD60044.2023.00013
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The complexity inherent in managing cloud computing systems calls for novel solutions that can effectively enforce high-level Service Level Objectives (SLOs) promptly. Unfortunately, most of the current SLO management solutions rely on reactive approaches, i.e., correcting SLO violations only after they have occurred. Further, the few methods that explore predictive techniques to prevent SLO violations focus solely on forecasting low-level system metrics, such as CPU and Memory utilization. Although valid in some cases, these metrics do not necessarily provide clear and actionable insights into application behavior. This paper presents a novel approach that directly predicts high-level SLOs using low-level system metrics. We target this goal by training and optimizing two state-of-the-art neural network models, a Short-Term Long Memory LSTM-, and a Transformer-based model. Our models provide actionable insights into application behavior by establishing proper connections between the evolution of low-level workload-related metrics and the high-level SLOs. We demonstrate our approach to selecting and preparing the data. We show in practice how to optimize LSTM and Transformer by targeting efficiency as a high-level SLO metric and performing a comparative analysis. We show how these models behave when the input workloads come from different distributions. Consequently, we demonstrate their ability to generalize in heterogeneous systems. Finally, we operationalize our two models by integrating them into the Polaris framework we have been developing to enable a performance-driven SLO-native approach to Cloud computing.
引用
收藏
页码:24 / 34
页数:11
相关论文
共 50 条
  • [21] JAPO: learning join and pushdown order for cloud-native join optimization
    Yuan, Yuchen
    Feng, Xiaoyue
    Zhang, Bo
    Zhang, Pengyi
    Song, Jie
    FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (06)
  • [22] JAPO: learning join and pushdown order for cloud-native join optimization
    YUAN Yuchen
    FENG Xiaoyue
    ZHANG Bo
    ZHANG Pengyi
    SONG Jie
    Frontiers of Computer Science, 2024, 18 (06)
  • [23] A Cloud-Native Online Judge System
    Pan, Guan-Chen
    Liu, Pangfeng
    Wu, Jan-Jan
    2022 IEEE 46TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2022), 2022, : 1293 - 1298
  • [24] State Management for Cloud-Native Applications
    Szalay, Mark
    Matray, Peter
    Toka, Laszlo
    ELECTRONICS, 2021, 10 (04) : 1 - 27
  • [25] Cloud-Native Transactions and Analytics in SingleStore
    Prout, Adam
    Wang, Szu-Po
    Victor, Joseph
    Sun, Zhou
    Li, Yongzhu
    Chen, Jack
    Bergeron, Evan
    Hanson, Eric
    Walzer, Robert
    Gomes, Rodrigo
    Shamgunov, Nikita
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 2340 - 2352
  • [26] Benchmarking Scalability of Cloud-Native Applications
    Henning, Sören
    Hasselbring, Wilhelm
    Lecture Notes in Informatics (LNI), Proceedings - Series of the Gesellschaft fur Informatik (GI), 2023, P-332 : 59 - 60
  • [27] Forensic analysis of cloud-native artifacts
    Roussev, Vassil
    McCulley, Shane
    DIGITAL INVESTIGATION, 2016, 16 : S104 - S113
  • [28] A Holistic Approach to Data Access for Cloud-Native Analytics and Machine Learning
    Koutsovasilis, Panos
    Venugopal, Srikumar
    Gkoufas, Yiannis
    Pinto, Christian
    2021 IEEE 14TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD 2021), 2021, : 654 - 659
  • [29] Monitoring Probe Deployment Patterns for Cloud-Native Applications: Definition and Empirical Assessment
    Tundo, Alessandro
    Mobilio, Marco
    Riganelli, Oliviero
    Mariani, Leonardo
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2024, 17 (04) : 1636 - 1654
  • [30] ITS_LIVE: A Cloud-Native Approach to Monitoring Glaciers From Space
    Lopez, Luis A.
    Gardner, Alex S.
    Greene, Chad A.
    Kennedy, Joseph H.
    Liukis, Maria
    Fahnestock, Mark A.
    Scambos, Ted
    Fahnestock, Jacob R.
    COMPUTING IN SCIENCE & ENGINEERING, 2023, 25 (06) : 49 - 56