Reliability in Cloud Computing System: A Review

被引:0
|
作者
Duan W. [1 ]
Hu M. [1 ]
Zhou Q. [2 ]
Wu T. [1 ]
Zhou J. [3 ]
Liu X. [4 ]
Wei T. [1 ]
Chen M. [1 ]
机构
[1] Shanghai Key Laboratory of Trustworthy Computing, East China Normal University, Shanghai
[2] School of Economics and Finance, Shanghai International Studies University, Shanghai
[3] School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing
[4] School of Information Technology, Deakin University, Melbourne, 3125, VIC
基金
中国国家自然科学基金;
关键词
Cloud computing; Energy consumption; Fault management; Reliability; Virtualization;
D O I
10.7544/issn1000-1239.2020.20180675
中图分类号
学科分类号
摘要
As a new computing paradigm, cloud computing has attracts extensive concerns from both academic and industrial fields. Based on resource virtualization technology, cloud computing provides users with services in the forms of infrastructure, platform and software in a "pay-as-you-go" manner. In the meanwhile, since cloud computing provides highly scalable computing resources, more and more enterprises and organizations choose cloud computing platforms to deploy their scientific or commercial applications. However, with the increasing number of cloud users, cloud data centers continuously expand and the architecture becomes increasingly complex, leading to growing runtime failures in cloud computing systems. Therefore, how to ensure the system reliability in cloud computing systems with large scale and complex architecture has become a huge challenge. This paper first summarizes various failures in cloud systems, introduces several methods to evaluate the reliability of cloud computing, and describes some key fault management mechanisms. Since fault management techniques inevitably increase energy consumption of cloud systems, this paper reviews current researches on the trade-off between reliability and energy efficiency in cloud computing. In the end, we propose some major challenges in current research of cloud computing reliability and concludes our paper. © 2020, Science Press. All right reserved.
引用
收藏
页码:102 / 123
页数:21
相关论文
共 119 条
  • [1] Foster I., Zhao Y., Raicu I., Et al., Cloud computing and grid computing 360-degree compared, Proc of Grid Computing Environments Workshop, pp. 1-10, (2008)
  • [2] Gartner forecasts worldwide public cloud services revenue to reach $260 billion in 2017
  • [3] Engelmann C., Geist A., Super-scalable algorithms for computing on 100, 000 processors, Proc of the 5th Int Conf on Computational Science, pp. 313-321, (2005)
  • [4] Barroso L., Hoelzle U., The datacenter as a computer: An introduction to the design of warehouse-scale machines, Synthesis Lectures on Computer Architecture, 8, 3, pp. 1-107, (2009)
  • [5] Cost of data center outages, pp. 1-21, (2016)
  • [6] Cohen G., Downtime, outages and failures-understanding their true costs
  • [7] Selic B., Fault tolerance techniques for distributed systems, (2004)
  • [8] Nazir B., Qureshi K., Manuel P., Replication based fault tolerant job scheduling strategy for economy driven grid, Journal of Supercomputing, 62, 2, pp. 855-873, (2012)
  • [9] Sun D., Chang G., Miao C., Et al., Analyzing, modelling and evaluating dynamic adaptive fault tolerance strategies in cloud computing environments, Journal of Security and Networks, 66, 1, pp. 193-228, (2013)
  • [10] Haider S., Ansari N., Temperature based fault forecasting in computer clusters, Proc of the 15th Int Multi Topic Conf, pp. 69-77, (2012)