A Hybrid Machine Learning Approach for Performance Modeling of Cloud-Based Big Data Applications

被引:3
|
作者
Ataie, Ehsan [1 ,2 ]
Evangelinou, Athanasia [3 ]
Gianniti, Eugenio [3 ]
Ardagna, Danilo [3 ]
机构
[1] Univ Mazandaran, Dept Comp Engn, Babolsar, Iran
[2] Univ Mazandaran, Distributed Comp Syst Res Grp, Babolsar, Iran
[3] Politecn Milan, Dipartimento Elettron Informaz & Bioingn, Milan, Italy
关键词
analytical performance modeling; machine learning; cloud computing; MapReduce; Hadoop; Tez; Spark;
D O I
10.1093/comjnl/bxab131
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, Apache Hadoop and Apache Spark are two of the most prominent distributed solutions for processing big data applications on the market. Since in many cases these frameworks are adopted to support business critical activities, it is often important to predict with fair confidence the execution time of submitted applications, for instance when service-level agreements are established with end-users. In this work, we propose and validate a hybrid approach for the performance prediction of big data applications running on clouds, which exploits both analytical modeling and machine learning (ML) techniques and it is able to achieve a good accuracy without too many time consuming and costly experiments on a real setup. The experimental results show how the proposed approach attains improvement in accuracy, number of experiments to be run on the operational system and cost over applying ML techniques without any support from analytical models. Moreover, we compare our approach with Ernest, an ML-based technique proposed in the literature by the Spark inventors. Experiments show that Ernest can accurately estimate the performance in interpolating scenarios while it fails to predict the performance when configurations with increasing number of cores are considered. Finally, a comparison with a similar hybrid approach proposed in the literature demonstrates how our approach significantly reduce prediction errors especially when few experiments on the real system are performed.
引用
收藏
页码:3123 / 3140
页数:18
相关论文
共 50 条
  • [41] A Cloud-based Architecture for Condition Monitoring based on Machine Learning
    Arevalo, Fernando
    Diprasetya, Mochammad Rizky
    Schwung, Andreas
    2018 IEEE 16TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2018, : 163 - 168
  • [42] A cloud-based architecture for Big-Data Analytics in Smart Grid: A Proposal
    Mayilvaganan, M.
    Sabitha, M.
    2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2013, : 256 - 259
  • [43] Evading Cyber-Attacks on Hadoop Ecosystem: A Novel Machine Learning-Based Security-Centric Approach towards Big Data Cloud
    Sharma, Neeraj A.
    Kumar, Kunal
    Khorshed, Tanzim
    Ali, A. B. M. Shawkat
    Khalid, Haris M.
    Muyeen, S. M.
    Jose, Linju
    INFORMATION, 2024, 15 (09)
  • [44] Cloud-based big data analytics integration with ERP platforms
    Romero, Jorge A.
    Abad, Cristina
    MANAGEMENT DECISION, 2022, 60 (12) : 3416 - 3437
  • [45] Using machine learning to optimize parallelism in big data applications
    Brandon Hernandez, Alvaro
    Perez, Maria S.
    Gupta, Smrati
    Muntes-Mulero, Victor
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 86 : 1076 - 1092
  • [46] BIG DATA, AGENTS, AND MACHINE LEARNING: TOWARDS A DATA-DRIVEN AGENT-BASED MODELING APPROACH
    Kavak, Hamdi
    Padilla, Jose J.
    Lynch, Christopher J.
    Diallo, Saikou Y.
    PROCEEDINGS OF THE ANNUAL SIMULATION SYMPOSIUM (ANSS 2018), 2018, 50 (02):
  • [47] Guarding the Cloud: An Effective Detection of Cloud-Based Cyber Attacks using Machine Learning Algorithms
    Rexha, Blerim
    Thaqi, Rrezearta
    Mazrekaj, Artan
    Vishi, Kamer
    INTERNATIONAL JOURNAL OF ONLINE AND BIOMEDICAL ENGINEERING, 2023, 19 (18) : 158 - 174
  • [48] Efficient Federated Learning for Cloud-Based AIoT Applications
    Zhang, Xinqian
    Hu, Ming
    Xia, Jun
    Wei, Tongquan
    Chen, Mingsong
    Hu, Shiyan
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2021, 40 (11) : 2211 - 2223
  • [49] A filter-based machine learning classification framework for cloud-based medical databases
    Sri, V. Devi Satya
    Vemuru, Srikanth
    INTERNATIONAL JOURNAL OF AD HOC AND UBIQUITOUS COMPUTING, 2022, 40 (1-3) : 94 - 105
  • [50] Governance Factors Influencing Financial Performance in Cloud-Based Enterprises: A Machine Learning Analysis
    Huang, Ziling
    Lin, Lichao
    Jia, Xiaofei
    COMPUTATIONAL ECONOMICS, 2025,