Predicting the performance of big data applications on the cloud

被引:0
|
作者
D. Ardagna
E. Barbierato
E. Gianniti
M. Gribaudo
T. B. M. Pinto
A. P. C. da Silva
J. M. Almeida
机构
[1] Politecnico de Milano,Dipartimento di Elettronica, Informazione e Bioingegneria
[2] Universidade Federal de Minas Gerais,Departamento de Ciência da Computação
来源
关键词
Performance prediction; Apache spark; Parallel computing; Data science; Big data; Analytical and simulation models;
D O I
暂无
中图分类号
学科分类号
摘要
Data science applications have become widespread as a means to extract knowledge from large datasets. Such applications are often characterized by highly heterogeneous and irregular data access patterns, thus often being referred to as big data applications. Such characteristics make the application execution quite challenging for existing software and hardware infrastructures to meet their resource demands. The cloud computing paradigm, in turn, offers a natural hosting solution to such applications since its on-demand pricing model allows allocating effectively computing resources according to application’s needs. However, these properties impose extra challenge to the accurate performance prediction of cloud-based applications, which is a key step to adequate capacity planning and managing of the hosting infrastructure. In this article, we tackle this challenge by exploring three modeling approaches for predicting the performance of big data applications running on the cloud. We evaluate two queuing-based analytical models and dagSim, a fast ad-hoc simulator, in various scenarios based on different applications and infrastructure setups. The considered approaches are compared in terms of prediction accuracy and execution time. Our results indicate that our two best approaches, one analytical model and dagSim, can predict average application execution times with only up to a 7%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$7\%$$\end{document} relative error, on average. Moreover, a comparison with the widely used event-based simulator available with the Java Modeling Tool (JMT) suite demonstrates that both the analytical model and dagSim run very fast, requiring at least two orders of magnitude lower execution time than JMT while providing slightly better accuracy, being thus practical for online prediction.
引用
收藏
页码:1321 / 1353
页数:32
相关论文
共 50 条
  • [1] Predicting the performance of big data applications on the cloud
    Ardagna, D.
    Barbierato, E.
    Gianniti, E.
    Gribaudo, M.
    Pinto, T. B. M.
    da Silva, A. P. C.
    Almeida, J. M.
    JOURNAL OF SUPERCOMPUTING, 2021, 77 (02): : 1321 - 1353
  • [2] Performance modeling of big data applications in the cloud centers
    Chao Shen
    Weiqin Tong
    Jenq-Neng Hwang
    Qiang Gao
    The Journal of Supercomputing, 2017, 73 : 2258 - 2283
  • [3] Performance Evaluation of Big Data Applications in Cloud Providers
    Dourado, Leonardo dos Santos
    Miranda, Richard Siqueira
    de Araujo, Aleteia P. F.
    Ishikawa, Edson
    2020 15TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI'2020), 2020,
  • [4] Performance modeling of big data applications in the cloud centers
    Shen, Chao
    Tong, Weiqin
    Hwang, Jenq-Neng
    Gao, Qiang
    JOURNAL OF SUPERCOMPUTING, 2017, 73 (05): : 2258 - 2283
  • [5] Performance Prediction of Cloud-Based Big Data Applications
    Ardagna, Danilo
    Barbierato, Enrico
    Evangelinou, Athanasia
    Gianniti, Eugenio
    Gribaudo, Marco
    Pinto, Tulio B. M.
    Guimaraes, Anna
    da Silva, Ana Paula Couto
    Almeida, Jussara M.
    PROCEEDINGS OF THE 2018 ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING (ICPE '18), 2018, : 192 - 199
  • [6] Performance analysis model for big data applications in cloud computing
    Villalpando, Luis Eduardo Bautista
    April, Alain
    Abran, Alain
    JOURNAL OF CLOUD COMPUTING-ADVANCES SYSTEMS AND APPLICATIONS, 2014, 3
  • [7] Improving Performance of Cloud Computing and Big Data Technologies and Applications
    Zhenjiang Dong
    ZTE Communications, 2014, 12 (04) : 1 - 2
  • [8] A Performance Analysis of MapReduce Applications on Big Data in Cloud based Hadoop
    Gohil, Parth
    Garg, Dweepna
    Panchal, Bakul
    2014 INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND EMBEDDED SYSTEMS (ICICES), 2014,
  • [9] Performance evaluation of edge cloud computing system for big data applications
    Femminella, Mauro
    Pergolesi, Matteo
    Reali, Gianluca
    2016 5TH IEEE INTERNATIONAL CONFERENCE ON CLOUD NETWORKING (IEEE CLOUDNET), 2016, : 170 - 175
  • [10] I/O Performance Modeling for Big Data Applications over Cloud Infrastructures
    Mytilinis, Ioannis
    Tsoumakos, Dimitrios
    Kantere, Verena
    Nanos, Anastassios
    Koziris, Nectarios
    2015 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING (IC2E 2015), 2015, : 201 - 206