A Hybrid Machine Learning Approach for Performance Modeling of Cloud-Based Big Data Applications

被引:3
|
作者
Ataie, Ehsan [1 ,2 ]
Evangelinou, Athanasia [3 ]
Gianniti, Eugenio [3 ]
Ardagna, Danilo [3 ]
机构
[1] Univ Mazandaran, Dept Comp Engn, Babolsar, Iran
[2] Univ Mazandaran, Distributed Comp Syst Res Grp, Babolsar, Iran
[3] Politecn Milan, Dipartimento Elettron Informaz & Bioingn, Milan, Italy
关键词
analytical performance modeling; machine learning; cloud computing; MapReduce; Hadoop; Tez; Spark;
D O I
10.1093/comjnl/bxab131
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, Apache Hadoop and Apache Spark are two of the most prominent distributed solutions for processing big data applications on the market. Since in many cases these frameworks are adopted to support business critical activities, it is often important to predict with fair confidence the execution time of submitted applications, for instance when service-level agreements are established with end-users. In this work, we propose and validate a hybrid approach for the performance prediction of big data applications running on clouds, which exploits both analytical modeling and machine learning (ML) techniques and it is able to achieve a good accuracy without too many time consuming and costly experiments on a real setup. The experimental results show how the proposed approach attains improvement in accuracy, number of experiments to be run on the operational system and cost over applying ML techniques without any support from analytical models. Moreover, we compare our approach with Ernest, an ML-based technique proposed in the literature by the Spark inventors. Experiments show that Ernest can accurately estimate the performance in interpolating scenarios while it fails to predict the performance when configurations with increasing number of cores are considered. Finally, a comparison with a similar hybrid approach proposed in the literature demonstrates how our approach significantly reduce prediction errors especially when few experiments on the real system are performed.
引用
收藏
页码:3123 / 3140
页数:18
相关论文
共 50 条
  • [21] A Smart Agricultural Model by Integrating IoT, Mobile and Cloud-based Big Data Analytics
    Rajeswari, S.
    Suthendran, K.
    Rajakumar, K.
    PROCEEDINGS OF 2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL (I2C2), 2017,
  • [22] Machine Learning Approach for Cloud NoSQL Databases Performance Modeling
    Farias, Victor A. E.
    Sousa, Flavio R. C.
    Maia, Jose G. R.
    Gomes, Joao P. P.
    Machado, Javam C.
    2016 16TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2016, : 617 - 620
  • [23] Cloud-Based Parallel Machine Learning for Tool Wear Prediction
    Wu, Dazhong
    Jennings, Connor
    Terpenny, Janis
    Kumara, Soundar
    Gao, Robert X.
    JOURNAL OF MANUFACTURING SCIENCE AND ENGINEERING-TRANSACTIONS OF THE ASME, 2018, 140 (04):
  • [24] A Hybrid Cloud Infrastructure for Big Data Applications
    Loreti, Daniela
    Ciampolini, Anna
    2015 IEEE 17TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2015 IEEE 7TH INTERNATIONAL SYMPOSIUM ON CYBERSPACE SAFETY AND SECURITY, AND 2015 IEEE 12TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (ICESS), 2015, : 1713 - 1718
  • [25] Cloud-based machine learning for the detection of anonymous web proxies
    Miller, Shane
    Curran, Kevin
    Lunney, Tom
    2016 27TH IRISH SIGNALS AND SYSTEMS CONFERENCE (ISSC), 2016,
  • [26] Long-Term Spectrum Monitoring with Big Data Analysis and Machine Learning for Cloud-Based Radio Access Networks
    Pavel Baltiiski
    Ilia Iliev
    Boian Kehaiov
    Vladimir Poulkov
    Todor Cooklev
    Wireless Personal Communications, 2016, 87 : 815 - 835
  • [27] Optimizing the Topologies of Virtual Networks for Cloud-based Big Data Processing
    Xu, Cong
    Yang, Jiahai
    Yu, Hui
    Lin, Haizhuo
    Zhang, Hui
    2014 IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2014 IEEE 6TH INTL SYMP ON CYBERSPACE SAFETY AND SECURITY, 2014 IEEE 11TH INTL CONF ON EMBEDDED SOFTWARE AND SYST (HPCC,CSS,ICESS), 2014, : 189 - 196
  • [28] Cloud-Based Software Platform for Big Data Analytics in Smart Grids
    Simmhan, Yogesh
    Aman, Saima
    Kumbhare, Aloe
    Liu, Rongyang
    Stevens, Sam
    Zhou, Qunzhi
    Prasanna, Viktor
    COMPUTING IN SCIENCE & ENGINEERING, 2013, 15 (04) : 38 - 47
  • [29] Pipeline provenance for cloud-based big data analytics
    Wang, Ruoyu
    Sun, Daniel
    Li, Guoqiang
    Wong, Raymond
    Chen, Shiping
    SOFTWARE-PRACTICE & EXPERIENCE, 2020, 50 (05) : 658 - 674
  • [30] Long-Term Spectrum Monitoring with Big Data Analysis and Machine Learning for Cloud-Based Radio Access Networks
    Baltiiski, Pavel
    Iliev, Ilia
    Kehaiov, Boian
    Poulkov, Vladimir
    Cooklev, Todor
    WIRELESS PERSONAL COMMUNICATIONS, 2016, 87 (03) : 815 - 835