Machine Learning for Performance Prediction of Spark Cloud Applications

被引:19
|
作者
Maros, Alexandre [1 ]
Murai, Fabricio [1 ]
Couto da Silva, Ana Paula [1 ]
Almeida, Jussara M. [1 ]
Lattuada, Marco [2 ]
Gianniti, Eugenio [2 ]
Hosseini, Marjan [2 ]
Ardagna, Danilo [2 ]
机构
[1] Univ Fed Minas Gerais, Dept Comp Sci, Belo Horizonte, MG, Brazil
[2] Politecn Milan, Dipartimento Elettron Informaz & Bioingn, Milan, Italy
来源
2019 IEEE 12TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (IEEE CLOUD 2019) | 2019年
基金
欧盟地平线“2020”;
关键词
Performance prediction; Spark; Machine learning;
D O I
10.1109/CLOUD.2019.00028
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Big data applications and analytics are employed in many sectors for a variety of goals: improving customers satisfaction, predicting market behavior or improving processes in public health. These applications consist of complex software stacks that are often run on cloud systems. Predicting execution times is important for estimating the cost of cloud services and for effectively managing the underlying resources at runtime. Machine Learning (ML), providing black box solutions to model the relationship between application performance and system configuration without requiring in-detail knowledge of the system, has become a popular way of predicting the performance of big data applications. We investigate the cost-benefits of using supervised ML models for predicting the performance of applications on Spark, one of today's most widely used frameworks for big data analysis. We compare our approach with Ernest (an ML-based technique proposed in the literature by the Spark inventors) on a range of scenarios, application workloads, and cloud system configurations. Our experiments show that Ernest can accurately estimate the performance of very regular applications, but it fails when applications exhibit more irregular patterns and/or when extrapolating on bigger data set sizes. Results show that our models match or exceed Ernest's performance, sometimes enabling us to reduce the prediction error from 126-187% to only 5-19%.
引用
收藏
页码:99 / 106
页数:8
相关论文
共 50 条
  • [1] On Machine Learning-based Stage-aware Performance Prediction of Spark Applications
    Ye, Guangjun
    Liu, Wuji
    Wu, Chase Q.
    Shen, Wei
    Lyu, Xukang
    2020 IEEE 39TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2020,
  • [2] Understanding and Optimizing the Performance of Distributed Machine Learning Applications on Apache Spark
    Dunner, Celestine
    Parnell, Thomas
    Atasu, Kubilay
    Sifalakis, Manolis
    Pozidis, Haralampos
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 331 - 338
  • [3] Monitorless: Predicting Performance Degradation in Cloud Applications with Machine Learning
    Grohmann, Johannes
    Nicholson, Patrick K.
    Iglesias, Jesus Omana
    Kounev, Samuel
    Lugones, Diego
    MIDDLEWARE'19: PROCEEDINGS OF THE 2019 MIDDLEWARE'19: 20TH INTERNATIONAL MIDDLEWARE CONFERENCE, 2019, : 149 - 162
  • [4] A preliminary study of machine learning workload prediction techniques for cloud applications
    Kirchoff, Dionatra F.
    Xavier, Miguel
    Mastella, Juliana
    De Rose, Cesar A. F.
    2019 27TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP), 2019, : 222 - 227
  • [5] Performance Analysis of Machine Learning Centered Workload Prediction Models for Cloud
    Saxena, Deepika
    Kumar, Jitendra
    Singh, Ashutosh Kumar
    Schmid, Stefan
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (04) : 1313 - 1330
  • [6] Machine learning for total cloud cover prediction
    Ágnes Baran
    Sebastian Lerch
    Mehrez El Ayari
    Sándor Baran
    Neural Computing and Applications, 2021, 33 : 2605 - 2620
  • [7] Machine learning for total cloud cover prediction
    Baran, Agnes
    Lerch, Sebastian
    El Ayari, Mehrez
    Baran, Sandor
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (07): : 2605 - 2620
  • [8] Cloud-agnostic architectures for machine learning based on Apache Spark
    Nagy, Eniko
    Lovas, Robert
    Pintye, Istvan
    Hajnal, Akos
    Kacsuk, Peter
    ADVANCES IN ENGINEERING SOFTWARE, 2021, 159
  • [9] Performance Optimization of Machine Learning Algorithms Based on Spark
    Luo W.
    Zhang S.
    Xu Y.
    Appl. Math. Nonlinear Sci., 2024, 1
  • [10] An Architectural Schema for Performance Prediction using Machine Learning in the Fog-to-Cloud Paradigm
    Sengupta, Souvik
    Garcia, Jordi
    Masip-Bruin, Xavi
    Prieto-Gonzalez, Andres
    2019 IEEE 10TH ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), 2019, : 994 - 1002