Fault Tolerance of Cloud Infrastructure with Machine Learning

被引:4
作者
Kalaskar, Chetankumar [1 ]
Thangam, S. [1 ]
机构
[1] Amrita Vishwavidyapeetam, Amrita Sch Comp, Dept Comp Sci & Engn, Bangalore 560035, Karnataka, India
关键词
Cloud computing; Fault tolerance; Machine learning; Reliability of cloud;
D O I
10.2478/cait-2023-0034
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Enhancing the fault tolerance of cloud systems and accurately forecasting cloud performance are pivotal concerns in cloud computing research. This research addresses critical concerns in cloud computing by enhancing fault tolerance and forecasting cloud performance using machine learning models. Leveraging the Google trace dataset with 10000 cloud environment records encompassing diverse metrics, we systematically have employed machine learning algorithms, including linear regression, decision trees, and gradient boosting, to construct predictive models. These models have outperformed baseline methods, with C5.0 and XGBoost showing exceptional accuracy, precision, and reliability in forecasting cloud behavior. Feature importance analysis has identified the ten most influential factors affecting cloud system performance. This work significantly advances cloud optimization and reliability, enabling proactive monitoring, early performance issue detection, and improved fault tolerance. Future research can further refine these predictive models, enhancing cloud resource management and ultimately improving service delivery in cloud computing.
引用
收藏
页码:26 / 50
页数:25
相关论文
共 35 条
  • [1] AbdElfattah E, 2017, 2017 13TH INTERNATIONAL COMPUTER ENGINEERING CONFERENCE (ICENCO), P190, DOI 10.1109/ICENCO.2017.8289786
  • [2] Abdullah S. M., 2020, International Journal of Grid and High PerformanceComputing, V12, P1
  • [3] Toward Antifragile Cloud Computing Infrastructures
    Abid, Amal
    Khemakhem, Mouna Torjmen
    Marzouk, Soumaya
    Ben Jemaa, Maher
    Monteil, Thierry
    Drira, Khalil
    [J]. 5TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT-2014), THE 4TH INTERNATIONAL CONFERENCE ON SUSTAINABLE ENERGY INFORMATION TECHNOLOGY (SEIT-2014), 2014, 32 : 850 - 855
  • [4] A survey on reliability in distributed systems
    Ahmed, Waseem
    Wu, Yong Wei
    [J]. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2013, 79 (08) : 1243 - 1255
  • [5] Alhaddad S., 2020, Journal of Cloud Computing, V9, P17
  • [6] Almukhaizim S. H. S., 2020, Joumal of Grid Computing, V18, P71
  • [7] Alomari F., 2021, International Journal of Distributed Systems and Technologies, V12, P44
  • [8] A survey of fault tolerance architecture in cloud computing
    Cheraghlou, Mehdi Nazari
    Khadem-Zadeh, Ahmad
    Haghparast, Majid
    [J]. JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2016, 61 : 81 - 92
  • [9] Dawei Sun, 2012, International Journal of Security and Networks, V7, P196
  • [10] Deepika T., 2020, International Journal of Electrical and Computer Engineering, V10, pPP1524, DOI [10.11591/ijece.v10i2.pp1524-1532, DOI 10.11591/IJECE.V10I2.PP1524-1532]