A Hybrid Machine Learning Approach for Performance Modeling of Cloud-Based Big Data Applications

被引:3
|
作者
Ataie, Ehsan [1 ,2 ]
Evangelinou, Athanasia [3 ]
Gianniti, Eugenio [3 ]
Ardagna, Danilo [3 ]
机构
[1] Univ Mazandaran, Dept Comp Engn, Babolsar, Iran
[2] Univ Mazandaran, Distributed Comp Syst Res Grp, Babolsar, Iran
[3] Politecn Milan, Dipartimento Elettron Informaz & Bioingn, Milan, Italy
关键词
analytical performance modeling; machine learning; cloud computing; MapReduce; Hadoop; Tez; Spark;
D O I
10.1093/comjnl/bxab131
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, Apache Hadoop and Apache Spark are two of the most prominent distributed solutions for processing big data applications on the market. Since in many cases these frameworks are adopted to support business critical activities, it is often important to predict with fair confidence the execution time of submitted applications, for instance when service-level agreements are established with end-users. In this work, we propose and validate a hybrid approach for the performance prediction of big data applications running on clouds, which exploits both analytical modeling and machine learning (ML) techniques and it is able to achieve a good accuracy without too many time consuming and costly experiments on a real setup. The experimental results show how the proposed approach attains improvement in accuracy, number of experiments to be run on the operational system and cost over applying ML techniques without any support from analytical models. Moreover, we compare our approach with Ernest, an ML-based technique proposed in the literature by the Spark inventors. Experiments show that Ernest can accurately estimate the performance in interpolating scenarios while it fails to predict the performance when configurations with increasing number of cores are considered. Finally, a comparison with a similar hybrid approach proposed in the literature demonstrates how our approach significantly reduce prediction errors especially when few experiments on the real system are performed.
引用
收藏
页码:3123 / 3140
页数:18
相关论文
共 50 条
  • [1] Performance Prediction of Cloud-Based Big Data Applications
    Ardagna, Danilo
    Barbierato, Enrico
    Evangelinou, Athanasia
    Gianniti, Eugenio
    Gribaudo, Marco
    Pinto, Tulio B. M.
    Guimaraes, Anna
    da Silva, Ana Paula Couto
    Almeida, Jussara M.
    PROCEEDINGS OF THE 2018 ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING (ICPE '18), 2018, : 192 - 199
  • [2] Memory Scaling of Cloud-Based Big Data Systems: A Hybrid Approach
    Wang, Xinying
    Xu, Cong
    Wang, Ke
    Yan, Feng
    Zhao, Dongfang
    IEEE TRANSACTIONS ON BIG DATA, 2022, 8 (05) : 1259 - 1272
  • [3] A Cloud-Based Framework for Machine Learning Workloads and Applications
    Lopez Garcia, Alvaro
    Marco De Lucas, Jesus
    Antonacci, Marica
    Zu Castell, Wolfgang
    David, Mario
    Hardt, Marcus
    Lloret Iglesias, Lara
    Molto, German
    Plociennik, Marcin
    Viet Tran
    Alic, Andy S.
    Caballer, Miguel
    Campos Plasencia, Isabel
    Costantini, Alessandro
    Dlugolinsky, Stefan
    Duma, Doina Cristina
    Donvito, Giacinto
    Gomes, Jorge
    Heredia Cacha, Ignacio
    Ito, Keiichi
    Kozlov, Valentin Y.
    Giang Nguyen
    Orviz Fernandez, Pablo
    SUstr, Zdenek
    Wolniewicz, Pawel
    IEEE ACCESS, 2020, 8 (08): : 18681 - 18692
  • [4] Performance-Aware Refactoring of Cloud-based Big Data Applications
    Li, Chen
    Casale, Giuliano
    PROCEEDINGS 2017 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI), 2017, : 1505 - 1510
  • [5] A Performance Analysis of MapReduce Applications on Big Data in Cloud based Hadoop
    Gohil, Parth
    Garg, Dweepna
    Panchal, Bakul
    2014 INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND EMBEDDED SYSTEMS (ICICES), 2014,
  • [6] Toward a Cloud-based security intelligence with big data processing
    Benzidane, Karim
    El Alloussi, Hassan
    El Warrak, Othman
    Fetjah, Leila
    Andaloussi, Said Jai
    Sekkaki, Abderrahim
    NOMS 2016 - 2016 IEEE/IFIP NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM, 2016, : 1089 - 1092
  • [7] A multi-layered performance analysis for cloud-based topic detection and tracking in Big Data applications
    Wang, Meisong
    Jayaraman, Prem Prakash
    Solaiman, Ellis
    Chen, Lydia Y.
    Li, Zheng
    Jun, Song
    Georgakopoulos, Dimitrios
    Ranjan, Rajiv
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 87 : 580 - 590
  • [8] Performance modeling of big data applications in the cloud centers
    Chao Shen
    Weiqin Tong
    Jenq-Neng Hwang
    Qiang Gao
    The Journal of Supercomputing, 2017, 73 : 2258 - 2283
  • [9] A Combined Analytical Modeling Machine Learning Approach for Performance Prediction of MapReduce Jobs in Cloud Environment
    Ataie, Ehsan
    Gianniti, Eugenio
    Ardagna, Danilo
    Movaghar, Ali
    PROCEEDINGS OF 2016 18TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC), 2016, : 431 - 439
  • [10] Cloud-based disaster management architecture using hybrid machine learning approach in IoT
    Ozen, Figen
    Souri, Alireza
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (29) : 72357 - 72370