Machine Learning for Performance Prediction of Spark Cloud Applications

被引：19

作者：

Maros, Alexandre ^{[1
]}

Murai, Fabricio ^{[1
]}

Couto da Silva, Ana Paula ^{[1
]}

Almeida, Jussara M. ^{[1
]}

Lattuada, Marco ^{[2
]}

Gianniti, Eugenio ^{[2
]}

Hosseini, Marjan ^{[2
]}

Ardagna, Danilo ^{[2
]}

机构：

[1] Univ Fed Minas Gerais, Dept Comp Sci, Belo Horizonte, MG, Brazil

[2] Politecn Milan, Dipartimento Elettron Informaz & Bioingn, Milan, Italy

来源：

2019 IEEE 12TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (IEEE CLOUD 2019) | 2019年

基金：

欧盟地平线“2020”;

关键词：

Performance prediction; Spark; Machine learning;

D O I：

10.1109/CLOUD.2019.00028

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Big data applications and analytics are employed in many sectors for a variety of goals: improving customers satisfaction, predicting market behavior or improving processes in public health. These applications consist of complex software stacks that are often run on cloud systems. Predicting execution times is important for estimating the cost of cloud services and for effectively managing the underlying resources at runtime. Machine Learning (ML), providing black box solutions to model the relationship between application performance and system configuration without requiring in-detail knowledge of the system, has become a popular way of predicting the performance of big data applications. We investigate the cost-benefits of using supervised ML models for predicting the performance of applications on Spark, one of today's most widely used frameworks for big data analysis. We compare our approach with Ernest (an ML-based technique proposed in the literature by the Spark inventors) on a range of scenarios, application workloads, and cloud system configurations. Our experiments show that Ernest can accurately estimate the performance of very regular applications, but it fails when applications exhibit more irregular patterns and/or when extrapolating on bigger data set sizes. Results show that our models match or exceed Ernest's performance, sometimes enabling us to reduce the prediction error from 126-187% to only 5-19%.

引用

页码：99 / 106

页数：8

共 50 条

[1] On Machine Learning-based Stage-aware Performance Prediction of Spark Applications
Ye, Guangjun
Liu, Wuji
Wu, Chase Q.
Shen, Wei
Lyu, Xukang
2020 IEEE 39TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2020,
[2] Understanding and Optimizing the Performance of Distributed Machine Learning Applications on Apache Spark
Dunner, Celestine
Parnell, Thomas
Atasu, Kubilay
Sifalakis, Manolis
Pozidis, Haralampos
2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 331 - 338
[3] Monitorless: Predicting Performance Degradation in Cloud Applications with Machine Learning
Grohmann, Johannes
Nicholson, Patrick K.
Iglesias, Jesus Omana
Kounev, Samuel
Lugones, Diego
MIDDLEWARE'19: PROCEEDINGS OF THE 2019 MIDDLEWARE'19: 20TH INTERNATIONAL MIDDLEWARE CONFERENCE, 2019, : 149 - 162
[4] A preliminary study of machine learning workload prediction techniques for cloud applications
Kirchoff, Dionatra F.
Xavier, Miguel
Mastella, Juliana
De Rose, Cesar A. F.
2019 27TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP), 2019, : 222 - 227
[5] Performance Analysis of Machine Learning Centered Workload Prediction Models for Cloud
Saxena, Deepika
Kumar, Jitendra
Singh, Ashutosh Kumar
Schmid, Stefan
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (04) : 1313 - 1330
[6] Machine learning for total cloud cover prediction
Ágnes Baran
Sebastian Lerch
Mehrez El Ayari
Sándor Baran
Neural Computing and Applications, 2021, 33 : 2605 - 2620
[7] Machine learning for total cloud cover prediction
Baran, Agnes
Lerch, Sebastian
El Ayari, Mehrez
Baran, Sandor
NEURAL COMPUTING & APPLICATIONS, 2021, 33 (07): : 2605 - 2620
[8] Cloud-agnostic architectures for machine learning based on Apache Spark
Nagy, Eniko
Lovas, Robert
Pintye, Istvan
Hajnal, Akos
Kacsuk, Peter
ADVANCES IN ENGINEERING SOFTWARE, 2021, 159
[9] Performance Optimization of Machine Learning Algorithms Based on Spark
Luo W.
Zhang S.
Xu Y.
Appl. Math. Nonlinear Sci., 2024, 1
[10] An Architectural Schema for Performance Prediction using Machine Learning in the Fog-to-Cloud Paradigm
Sengupta, Souvik
Garcia, Jordi
Masip-Bruin, Xavi
Prieto-Gonzalez, Andres
2019 IEEE 10TH ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), 2019, : 994 - 1002

← 1 2 3 4 5 →