Heterogeneous Task Scheduling Framework in Emerging Distributed Computing Systems

被引：0

作者：

Liu R.-Q. ^{[1
]}

Li B.-Y. ^{[1
]}

Gao Y.-J. ^{[1
]}

Li C.-S. ^{[1
]}

Zhao H.-T. ^{[2
]}

Jin F.-S. ^{[1
]}

Li R.-H. ^{[1
]}

Wang G.-R. ^{[1
]}

机构：

[1] School of Computer Science and Technology, Beijing Institute of Technology, Beijing

[2] School of Computer Science and Technology, Northeastern University, Shenyang

来源：

Ruan Jian Xue Bao/Journal of Software | 2022年 / 33卷 / 03期

关键词：

Autoscale; Distributed computing; Heterogeneous task; Load balance; Task scheduling;

D O I：

10.13328/j.cnki.jos.006451

中图分类号：

学科分类号：

摘要：

With the rapid development of big data and machine learning, the distributed big data computing engine for machine learning have emerged. These systems can support both batch distributed learning and incremental learning and verification, with low latency and high performance. However, some of them adopt a random task scheduling strategy, ignoring the performance differences of nodes, which easily lead to uneven load and performance degradation. At the same time, for some tasks, if the resource requirements are not met, the scheduling will fail. In response to these problems, a heterogeneous task scheduling framework is proposed, which can ensure the efficient execution and execution of tasks. Specifically, for the task scheduling module, the proposed framework proposes a probabilistic random scheduling strategy resource-Pick_kx and a definite smooth weighted round-robin algorithm around the heterogeneous computing resources of nodes. The resource-Pick_kx al-gorithm calculates the probability according to the performance of the node, and performs random scheduling with probability. The higher the probability of a node with high performance, the higher the possibility of task scheduling to this node. The smooth weighted round-robin algorithm sets the weights according to the node performance at the beginning, and smoothly weights during the scheduling process, so that the task is scheduled to the node with the highest performance. In addition, for task scenarios where resources do not meet the requirements, a container-based vertical expansion mechanism is proposed to customize task resources, create nodes to join the cluster, and complete task scheduling again. The performance of the framework is tested on benchmarks and public data sets through ex-periments. Compared with the current strategy, the performance of the proposed frame is improved by 10% to 20%. © Copyright 2022, Institute of Software, the Chinese Academy of Sciences. All rights reserved.

引用

页码：1005 / 1017

页数：12

共 26 条

[1] Dean J, Ghemawat S., MapReduce: Simplified data processing on large clusters, Proc. of the 6th Conf. on Symp. on Opearting Systems Design & Implementation, 6, pp. 259-272, (2004)
[2] Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I., Spark: Cluster computing with working sets, Proc. of the 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 2010), (2010)
[3] Carbone P, Katsifodimos A, Ewen S, Markl V, Haridi S, Tzoumas K., Apache Flink: Stream and batch processing in a single engine, Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 38, 4, pp. 28-38, (2015)
[4] Iqbal MH, Soomro TR., Big data analysis: Apache storm perspective, Int’l Journal of Computer Trends and Technology, 19, 1, pp. 9-14, (2015)
[5] Meng X, Bradley J, Yavuz B, Sparks E, Talwalkar A., Mllib: Machine learning in apache spark, The Journal of Machine Learning Research, 17, 1, pp. 1235-1241, (2016)
[6] Abadi M, Barham P, Chen J., TensorFlow: A system for large-scale machine learning, Proc. of the 12th Symp. on Operating Systems Design and Implementation, pp. 265-283, (2016)
[7] Moritz P, Nishihara R, Wang S., Ray: A distributed framework for emerging AI applications, Proc. of the 13th Symp. on Operating Systems Design and Implementation, pp. 561-577, (2018)
[8] Burns B, Grant B, Oppenheimer D., Borg, Omega, and Kubernetes, Communications of the ACM, 59, 5, pp. 50-57, (2016)
[9] Vavilapalli VK, Murthy AC, Douglas C., Apache hadoop yarn: Yet another resource negotiator, Proc. of the 4th Annual Symp. on Cloud Computing, pp. 1-16, (2013)
[10] Hindman B, Konwinski A, Zaharia M., Mesos: A platform for fine-grained resource sharing in the data center, Proc. of the 8th USENIX Symp. on Networked Systems Design and Implementation, pp. 295-308, (2011)

← 1 2 3 →