Spark;
Job scheduling;
Deep reinforcement learning;
Hybrid cloud environments;
ALGORITHM;
D O I:
10.1007/s10723-022-09630-1
中图分类号:
TP [自动化技术、计算机技术];
学科分类号:
0812 ;
摘要:
Spark is one of the most important big data computing engines, favored by academia and industry for its low latency and ease of use. The explosive growth in data volumes is causing computing tasks that could otherwise run on local or on-premise resources to become infeasible. The emergence of public clouds has solved the problem of shortage of local or on-premise resources. However, deploying clusters only on public clouds can be costly on the one hand and wasteful of available local resources on the other. Therefore, deploying a Spark cluster on both local and public cloud resources becomes a good solution to save cost and not waste local resources. When Spark is deployed in hybrid cloud environments, its default scheduling policy ignores job and environment characteristics leading to performance degradation and increased cluster usage costs. In this paper, A deep reinforcement learning-based (DRL-based) Spark job scheduler is proposed to improve cluster performance and reduce the total cost of cluster usage in hybrid cloud environments. Specifically, the proposed DRL agent can adaptively learn the characteristics of different types of jobs and hybrid cloud environments to rationally schedule Spark jobs to reduce the total cluster usage cost and the average bounded slowdown of jobs. A simulation environment is built to train the proposed scheduling agent, and the Spark Core module is extended to verify the effectiveness of the proposed scheduling agent. Experimental results show that the DRL-based algorithm improves performance by 5.55% and reduces the total cluster usage cost by 13.9% on average compared to the baseline algorithm in burst arrival mode.