Reliability in Cloud Computing System: A Review

被引:0
|
作者
Duan W. [1 ]
Hu M. [1 ]
Zhou Q. [2 ]
Wu T. [1 ]
Zhou J. [3 ]
Liu X. [4 ]
Wei T. [1 ]
Chen M. [1 ]
机构
[1] Shanghai Key Laboratory of Trustworthy Computing, East China Normal University, Shanghai
[2] School of Economics and Finance, Shanghai International Studies University, Shanghai
[3] School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing
[4] School of Information Technology, Deakin University, Melbourne, 3125, VIC
基金
中国国家自然科学基金;
关键词
Cloud computing; Energy consumption; Fault management; Reliability; Virtualization;
D O I
10.7544/issn1000-1239.2020.20180675
中图分类号
学科分类号
摘要
As a new computing paradigm, cloud computing has attracts extensive concerns from both academic and industrial fields. Based on resource virtualization technology, cloud computing provides users with services in the forms of infrastructure, platform and software in a "pay-as-you-go" manner. In the meanwhile, since cloud computing provides highly scalable computing resources, more and more enterprises and organizations choose cloud computing platforms to deploy their scientific or commercial applications. However, with the increasing number of cloud users, cloud data centers continuously expand and the architecture becomes increasingly complex, leading to growing runtime failures in cloud computing systems. Therefore, how to ensure the system reliability in cloud computing systems with large scale and complex architecture has become a huge challenge. This paper first summarizes various failures in cloud systems, introduces several methods to evaluate the reliability of cloud computing, and describes some key fault management mechanisms. Since fault management techniques inevitably increase energy consumption of cloud systems, this paper reviews current researches on the trade-off between reliability and energy efficiency in cloud computing. In the end, we propose some major challenges in current research of cloud computing reliability and concludes our paper. © 2020, Science Press. All right reserved.
引用
收藏
页码:102 / 123
页数:21
相关论文
共 119 条
  • [51] Carreira J., Costa D., Silva J., Fault injection spot-checks computer system dependability, IEEE Spectrum, 36, 8, pp. 50-55, (1999)
  • [52] Dawson S., Jahanian F., Mitton T., ORCHESTRA: A probing and fault injection environment for testing protocol implementations, Proc of Int Computer Performance and Dependability Symp, (1996)
  • [53] Kanawati G., Kanawati N., Abraham J., FERRARI: A flexible software-based fault and error injection system, IEEE Transactions on Computers, 44, 2, pp. 248-260, (1995)
  • [54] Hoara W., Tixeuil S., A language-driven tool for fault injection in distributed systems, Proc of the Int Workshop on Grid Computing, pp. 194-201, (2005)
  • [55] Monnet S., Bertier M., Using failure injection mechanisms to experiment and evaluate a grid failure detector, Proc of the Int Conf on High Performance Computing for Computational Science, pp. 610-621, (2006)
  • [56] Song C., Assessing reliability of grid software systems using emergent features, Proc of Workshop on Reliability and Robustness in Grid Computing Systems, (2007)
  • [57] Topkara U., Song C., Woo J., Connected in a small world: Rapid integration of heterogenous biology resources, Proc of the 2nd Int Workshop on Grid Computing Environments, pp. 109-118, (2006)
  • [58] Chang B., Crary K., DeLap M., Et al., Trustless grid computing in concert, Proc of the 3rd Int Workshop on Grid Computing, pp. 112-125, (2002)
  • [59] Zhang Q., Cheng L., Boutaba R., Cloud computing: State of the art research issues, Journal of Internet Services and Applications, 1, 1, pp. 7-18, (2010)
  • [60] Abadi D., Data management in the cloud: Limitations and opportunities, IEEE Computer Society Technical Committee on Data Engineering, 32, 1, pp. 3-12, (2009)