Predicting the performance of big data applications on the cloud

被引:0
|
作者
D. Ardagna
E. Barbierato
E. Gianniti
M. Gribaudo
T. B. M. Pinto
A. P. C. da Silva
J. M. Almeida
机构
[1] Politecnico de Milano,Dipartimento di Elettronica, Informazione e Bioingegneria
[2] Universidade Federal de Minas Gerais,Departamento de Ciência da Computação
来源
The Journal of Supercomputing | 2021年 / 77卷
关键词
Performance prediction; Apache spark; Parallel computing; Data science; Big data; Analytical and simulation models;
D O I
暂无
中图分类号
学科分类号
摘要
Data science applications have become widespread as a means to extract knowledge from large datasets. Such applications are often characterized by highly heterogeneous and irregular data access patterns, thus often being referred to as big data applications. Such characteristics make the application execution quite challenging for existing software and hardware infrastructures to meet their resource demands. The cloud computing paradigm, in turn, offers a natural hosting solution to such applications since its on-demand pricing model allows allocating effectively computing resources according to application’s needs. However, these properties impose extra challenge to the accurate performance prediction of cloud-based applications, which is a key step to adequate capacity planning and managing of the hosting infrastructure. In this article, we tackle this challenge by exploring three modeling approaches for predicting the performance of big data applications running on the cloud. We evaluate two queuing-based analytical models and dagSim, a fast ad-hoc simulator, in various scenarios based on different applications and infrastructure setups. The considered approaches are compared in terms of prediction accuracy and execution time. Our results indicate that our two best approaches, one analytical model and dagSim, can predict average application execution times with only up to a 7%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$7\%$$\end{document} relative error, on average. Moreover, a comparison with the widely used event-based simulator available with the Java Modeling Tool (JMT) suite demonstrates that both the analytical model and dagSim run very fast, requiring at least two orders of magnitude lower execution time than JMT while providing slightly better accuracy, being thus practical for online prediction.
引用
收藏
页码:1321 / 1353
页数:32
相关论文
共 50 条
  • [1] Predicting the performance of big data applications on the cloud
    Ardagna, D.
    Barbierato, E.
    Gianniti, E.
    Gribaudo, M.
    Pinto, T. B. M.
    da Silva, A. P. C.
    Almeida, J. M.
    JOURNAL OF SUPERCOMPUTING, 2021, 77 (02) : 1321 - 1353
  • [2] Performance modeling of big data applications in the cloud centers
    Chao Shen
    Weiqin Tong
    Jenq-Neng Hwang
    Qiang Gao
    The Journal of Supercomputing, 2017, 73 : 2258 - 2283
  • [3] Performance Evaluation of Big Data Applications in Cloud Providers
    Dourado, Leonardo dos Santos
    Miranda, Richard Siqueira
    de Araujo, Aleteia P. F.
    Ishikawa, Edson
    2020 15TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI'2020), 2020,
  • [4] Performance modeling of big data applications in the cloud centers
    Shen, Chao
    Tong, Weiqin
    Hwang, Jenq-Neng
    Gao, Qiang
    JOURNAL OF SUPERCOMPUTING, 2017, 73 (05) : 2258 - 2283
  • [5] Performance analysis model for big data applications in cloud computing
    Villalpando, Luis Eduardo Bautista
    April, Alain
    Abran, Alain
    JOURNAL OF CLOUD COMPUTING-ADVANCES SYSTEMS AND APPLICATIONS, 2014, 3
  • [6] Performance Prediction of Cloud-Based Big Data Applications
    Ardagna, Danilo
    Barbierato, Enrico
    Evangelinou, Athanasia
    Gianniti, Eugenio
    Gribaudo, Marco
    Pinto, Tulio B. M.
    Guimaraes, Anna
    da Silva, Ana Paula Couto
    Almeida, Jussara M.
    PROCEEDINGS OF THE 2018 ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING (ICPE '18), 2018, : 192 - 199
  • [7] Optimizing performance of Real-Time Big Data stateful streaming applications on Cloud
    Gupta, Amit
    Jain, Sushant
    2022 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (IEEE BIGCOMP 2022), 2022, : 1 - 4
  • [8] A Performance Analysis of MapReduce Applications on Big Data in Cloud based Hadoop
    Gohil, Parth
    Garg, Dweepna
    Panchal, Bakul
    2014 INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND EMBEDDED SYSTEMS (ICICES), 2014,
  • [9] AI based Performance Benchmarking & Analysis of Big Data and Cloud Powered Applications
    Vemulapati, Jayanti
    Khastgir, Anuruddha S.
    Savalgi, Chethana
    PROCEEDINGS OF THE 2019 ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING (ICPE '19), 2019, : 103 - 109
  • [10] Capacity Allocation for Big Data Applications in the Cloud
    Ciavotta, Michele
    Gianniti, Eugenio
    Ardagna, Danilo
    ICPE'17: COMPANION OF THE 2017 ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING, 2017, : 175 - 176