Demystifying deep learning in predictive monitoring for cloud-native SLOs

被引：1

作者：

Morichetta, Andrea ^{[1
]}

Pujol, Victor Casamayor ^{[1
]}

Nastic, Stefan ^{[1
]}

Pusztai, Thomas ^{[1
]}

Raith, Philipp ^{[1
]}

Dustdar, Schahram ^{[1
]}

Vij, Deepak ^{[2
]}

Xiong, Ying ^{[2
]}

Zhang, Zhaobo ^{[2
]}

机构：

[1] TU Wien, Distributed Syst Grp, Vienna, Austria

[2] Futurewei Technol Inc, Santa Clara, CA USA

来源：

2023 IEEE 16TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, CLOUD | 2023年

关键词：

workload prediction; neural networks; cloud; LSTM; Transformers; HOST LOAD PREDICTION; WORKLOAD; MODEL;

D O I：

10.1109/CLOUD60044.2023.00013

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The complexity inherent in managing cloud computing systems calls for novel solutions that can effectively enforce high-level Service Level Objectives (SLOs) promptly. Unfortunately, most of the current SLO management solutions rely on reactive approaches, i.e., correcting SLO violations only after they have occurred. Further, the few methods that explore predictive techniques to prevent SLO violations focus solely on forecasting low-level system metrics, such as CPU and Memory utilization. Although valid in some cases, these metrics do not necessarily provide clear and actionable insights into application behavior. This paper presents a novel approach that directly predicts high-level SLOs using low-level system metrics. We target this goal by training and optimizing two state-of-the-art neural network models, a Short-Term Long Memory LSTM-, and a Transformer-based model. Our models provide actionable insights into application behavior by establishing proper connections between the evolution of low-level workload-related metrics and the high-level SLOs. We demonstrate our approach to selecting and preparing the data. We show in practice how to optimize LSTM and Transformer by targeting efficiency as a high-level SLO metric and performing a comparative analysis. We show how these models behave when the input workloads come from different distributions. Consequently, we demonstrate their ability to generalize in heterogeneous systems. Finally, we operationalize our two models by integrating them into the Polaris framework we have been developing to enable a performance-driven SLO-native approach to Cloud computing.

引用

页码：24 / 34

页数：11

共 50 条

[21] JAPO: learning join and pushdown order for cloud-native join optimization
Yuan, Yuchen
Feng, Xiaoyue
Zhang, Bo
Zhang, Pengyi
Song, Jie
FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (06)
[22] JAPO: learning join and pushdown order for cloud-native join optimization
YUAN Yuchen
FENG Xiaoyue
ZHANG Bo
ZHANG Pengyi
SONG Jie
Frontiers of Computer Science, 2024, 18 (06)
[23] A Cloud-Native Online Judge System
Pan, Guan-Chen
Liu, Pangfeng
Wu, Jan-Jan
2022 IEEE 46TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2022), 2022, : 1293 - 1298
[24] State Management for Cloud-Native Applications
Szalay, Mark
Matray, Peter
Toka, Laszlo
ELECTRONICS, 2021, 10 (04) : 1 - 27
[25] Cloud-Native Transactions and Analytics in SingleStore
Prout, Adam
Wang, Szu-Po
Victor, Joseph
Sun, Zhou
Li, Yongzhu
Chen, Jack
Bergeron, Evan
Hanson, Eric
Walzer, Robert
Gomes, Rodrigo
Shamgunov, Nikita
PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 2340 - 2352
[26] Benchmarking Scalability of Cloud-Native Applications
Henning, Sören
Hasselbring, Wilhelm
Lecture Notes in Informatics (LNI), Proceedings - Series of the Gesellschaft fur Informatik (GI), 2023, P-332 : 59 - 60
[27] Forensic analysis of cloud-native artifacts
Roussev, Vassil
McCulley, Shane
DIGITAL INVESTIGATION, 2016, 16 : S104 - S113
[28] A Holistic Approach to Data Access for Cloud-Native Analytics and Machine Learning
Koutsovasilis, Panos
Venugopal, Srikumar
Gkoufas, Yiannis
Pinto, Christian
2021 IEEE 14TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD 2021), 2021, : 654 - 659
[29] Monitoring Probe Deployment Patterns for Cloud-Native Applications: Definition and Empirical Assessment
Tundo, Alessandro
Mobilio, Marco
Riganelli, Oliviero
Mariani, Leonardo
IEEE TRANSACTIONS ON SERVICES COMPUTING, 2024, 17 (04) : 1636 - 1654
[30] ITS_LIVE: A Cloud-Native Approach to Monitoring Glaciers From Space
Lopez, Luis A.
Gardner, Alex S.
Greene, Chad A.
Kennedy, Joseph H.
Liukis, Maria
Fahnestock, Mark A.
Scambos, Ted
Fahnestock, Jacob R.
COMPUTING IN SCIENCE & ENGINEERING, 2023, 25 (06) : 49 - 56

← 1 2 3 4 5 →