A Hybrid Machine Learning Approach for Performance Modeling of Cloud-Based Big Data Applications

被引:3
|
作者
Ataie, Ehsan [1 ,2 ]
Evangelinou, Athanasia [3 ]
Gianniti, Eugenio [3 ]
Ardagna, Danilo [3 ]
机构
[1] Univ Mazandaran, Dept Comp Engn, Babolsar, Iran
[2] Univ Mazandaran, Distributed Comp Syst Res Grp, Babolsar, Iran
[3] Politecn Milan, Dipartimento Elettron Informaz & Bioingn, Milan, Italy
关键词
analytical performance modeling; machine learning; cloud computing; MapReduce; Hadoop; Tez; Spark;
D O I
10.1093/comjnl/bxab131
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, Apache Hadoop and Apache Spark are two of the most prominent distributed solutions for processing big data applications on the market. Since in many cases these frameworks are adopted to support business critical activities, it is often important to predict with fair confidence the execution time of submitted applications, for instance when service-level agreements are established with end-users. In this work, we propose and validate a hybrid approach for the performance prediction of big data applications running on clouds, which exploits both analytical modeling and machine learning (ML) techniques and it is able to achieve a good accuracy without too many time consuming and costly experiments on a real setup. The experimental results show how the proposed approach attains improvement in accuracy, number of experiments to be run on the operational system and cost over applying ML techniques without any support from analytical models. Moreover, we compare our approach with Ernest, an ML-based technique proposed in the literature by the Spark inventors. Experiments show that Ernest can accurately estimate the performance in interpolating scenarios while it fails to predict the performance when configurations with increasing number of cores are considered. Finally, a comparison with a similar hybrid approach proposed in the literature demonstrates how our approach significantly reduce prediction errors especially when few experiments on the real system are performed.
引用
收藏
页码:3123 / 3140
页数:18
相关论文
共 50 条
  • [31] A Cloud-Based Machine Vision Approach for Utilization Prediction of Manual Machine Tools
    Parto, Mahmoud
    Han, Dongmin
    Rauby, Pierrick
    Ye, Chong
    Zhou, Yuanlai
    Chau, Duen Horng
    Kurfess, Thomas
    SMART AND SUSTAINABLE MANUFACTURING SYSTEMS, 2019, 3 (02): : 83 - 94
  • [32] Modeling of performance evaluation of educational information based on big data deep learning and cloud platform
    Ye, Jun
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 38 (06) : 7155 - 7165
  • [33] PhishNot: A Cloud-Based Machine-Learning Approach to Phishing URL Detection
    Alani, Mohammed M.
    Tawfik, Hissam
    COMPUTER NETWORKS, 2022, 218
  • [34] An approach for economic evaluation of cloud-based applications
    Pena-Ortiz, Raul
    Domenech, Josep
    Gil, Jose A.
    Pont, Ana
    2014 IEEE 3RD INTERNATIONAL CONFERENCE ON CLOUD NETWORKING (CLOUDNET), 2014, : 281 - 287
  • [35] Machine Learning for Performance Prediction of Spark Cloud Applications
    Maros, Alexandre
    Murai, Fabricio
    Couto da Silva, Ana Paula
    Almeida, Jussara M.
    Lattuada, Marco
    Gianniti, Eugenio
    Hosseini, Marjan
    Ardagna, Danilo
    2019 IEEE 12TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (IEEE CLOUD 2019), 2019, : 99 - 106
  • [36] Technical and Legal Strategic Approaches Protecting the Privacy of Personal Data in Cloud-Based Big Data Applications
    Arikan, Suleyman Muhammed
    2022 10TH INTERNATIONAL SYMPOSIUM ON DIGITAL FORENSICS AND SECURITY (ISDFS), 2022,
  • [37] Adaptive resource planning for cloud-based services using machine learning
    Nawrocki, Piotr
    Grzywacz, Mikolaj
    Sniezynski, Bartlomiej
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2021, 152 : 88 - 97
  • [38] Predicting the performance of big data applications on the cloud
    Ardagna, D.
    Barbierato, E.
    Gianniti, E.
    Gribaudo, M.
    Pinto, T. B. M.
    da Silva, A. P. C.
    Almeida, J. M.
    JOURNAL OF SUPERCOMPUTING, 2021, 77 (02) : 1321 - 1353
  • [39] Architectural Design for Data Security in Cloud-based Big Data Systems
    Jamali, Mujeeb-ur-Rehman
    Ali, Najma Imtiaz
    Memon, Abdul Ghafoor
    Maree, Mujeeb-u-Rehman
    Jamali, Aadil
    BAGHDAD SCIENCE JOURNAL, 2024, 21 (09) : 3062 - 3077
  • [40] Towards Cloud-Based Data Warehouse as a Service for Big Data Analytics
    Dabbechi, Hichem
    Nabli, Ahlem
    Bouzguenda, Lotfi
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2016, PT II, 2016, 9876 : 180 - 189