Reliability Enhancement in Cloud Computing via Optimized Job Scheduling Implementing Reinforcement Learning Algorithm and Queuing Theory

被引：10

作者：

Balla, Husamelddin A. M. N. ^{[1
]}

Chen, Guang Sheng ^{[1
]}

Jing, Weipeng ^{[1
]}

机构：

[1] Northeast Forestry Univ, Coll Informat & Comp Engn, Harbin, Heilongjiang, Peoples R China

来源：

2018 1ST INTERNATIONAL CONFERENCE ON DATA INTELLIGENCE AND SECURITY (ICDIS 2018) | 2018年

关键词：

Reinforcement learning; Q-learning; reliability; queuing theory; cloud computing; MANAGEMENT;

D O I：

10.1109/ICDIS.2018.00027

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Reliability in cloud systems is an important aspect of delivering stable cloud services for users. Focusing on improving successful execution of tasks under resource constraints, this work proposes an enhanced and effective resource management method to achieve reliability within the cloud environment. The proposed method employs an adaptive reinforcement learning algorithm merged with the queuing theory to schedule user requests. There are many dynamic changes in the cloud environment in terms of resource availability and attributes that make a reliable task execution difficult to guarantee. As a solution to this problem, our approach employs a task scheduler, which can effectively adapt to those dynamic changes and successfully schedule user requests. We developed an adaptive action-selection method that aims to control the action selection dynamically (i.e., suitable virtual machine selection), considering the queue buffer size and uncertainty value function. To evaluate the performance of our approach, we conduct several experiments and compare our approach with greedy and random job scheduling policies, in terms of successful task execution, utilization rate, and response time. The numerical results demonstrate the efficiency of our method.

引用

页码：127 / 130

页数：4

共 10 条

[1]

[Anonymous], 2013, DSN 13, DOI DOI 10.1109/DSN.2013.6575322

[2]

[Anonymous], 2009, ARXIV09032525

[3]

[Anonymous], 1998, REINFORCEMENT LEARNI

[4] Improving reliability in resource management through adaptive reinforcement learning for distributed systems [J].

Hussin, Masnida ;

Hamid, Nor Asilah Wati Abdul ;

Kasmiran, Khairul Azhar .

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2015, 75 :93-100

[5] Fault Tolerance Management in Cloud Computing: A System-Level Perspective [J].

Jhawar, Ravi ;

Piuri, Vincenzo ;

Santambrogio, Marco .

IEEE SYSTEMS JOURNAL, 2013, 7 (02) :288-297

[6] STOCHASTIC PROCESSES OCCURRING IN THE THEORY OF QUEUES AND THEIR ANALYSIS BY THE METHOD OF THE IMBEDDED MARKOV CHAIN [J].

KENDALL, DG .

ANNALS OF MATHEMATICAL STATISTICS, 1953, 24 (03) :338-354

[7] Performance analysis of cloud computing services considering resources sharing among virtual machines [J].

Liu, Xiaodong ;

Tong, Weiqin ;

Zhi, Xiaoli ;

Fu ZhiRen ;

Liao WenZhao .

JOURNAL OF SUPERCOMPUTING, 2014, 69 (01) :357-374

[8]

Vaquero LM, 2009, ACM SIGCOMM COMP COM, V39, P50, DOI 10.1145/1496091.1496100

[9]

Vouk Mladen A., 2008, Journal of Computing and Information Technology - CIT, V16, P235, DOI 10.2498/cit.1001391

[10] A novel multi-agent reinforcement learning approach for job scheduling in Grid computing [J].

Wu, Jun ;

Xu, Xin ;

Zhang, Pengcheng ;

Liu, Chunming .

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2011, 27 (05) :430-439

← 1 →