A Hard Real-time Scheduler for Spark on YARN

被引：4

作者：

Wang, Guolu ^{[1
]}

Xu, Jungang ^{[1
]}

Liu, Renfeng ^{[1
]}

Huang, Shanshan ^{[2
]}

机构：

[1] Univ Chinese Acad Sci, Sch Comp & Control Engn, Beijing, Peoples R China

[2] Beijing Univ Technol, Sch Software, Beijing, Peoples R China

来源：

2018 18TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID) | 2018年

基金：

中国国家自然科学基金;

关键词：

Spark; YARN; hard real-time; deadline; value density;

D O I：

10.1109/CCGRID.2018.00096

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Apache Spark is a fast and general engine for large-scale data processing using distributed memory. It provides different deploy modes to meet the needs of different users and Spark on YARN is the most popular deploy mode. Different deploy modes have different scheduling mechanisms. Spark on YARN has three different schedulers, including FIFO Scheduler, Fair Scheduler, and Capacity Scheduler. However, these three schedulers cannot fit hard real-time application scenarios. With the application of Apache Spark more widely, the needs of hard real-time scheduling will increase quickly. In this paper, we proposed a novel hard real-time scheduling algorithm called DVDA (Deadline and Value Density-Aware) in order to meet the requirements of hard real-time scheduling. Compared with traditional EDF (Earliest Deadline First) algorithm which only considers the deadline, the DVDA algorithm considers both the deadline and value density of the application. Furthermore, we implement a DVDA Scheduler for Spark on YARN based on the DVDA algorithm. Finally, the experiments are conducted to verify the effectiveness of the algorithm. Experimental results show that the proposed algorithm can increase the application completed rate by 18% and 6%, Value Income by 78% and 32% compared with default Capacity scheduler and EDF-Capacity scheduler respectively.

引用

页码：645 / 652

页数：8

共 17 条

[1]

[Anonymous], P 4 ANN S CLOUD COMP, DOI [10.1145/2523616.2523633, DOI 10.1145/2523616.2523633]

[2]

[Anonymous], BOOK EXTREMES

[3]

[Anonymous], INT J HIGH PERFORMAN

[4] H-PARAFAC: Hierarchical Parallel Factor Analysis of Multidimensional Big Data [J].

Chen, Dan ;

Hu, Yangyang ;

Wang, Lizhe ;

Zomaya, Albert Y. ;

Li, Xiaoli .

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (04) :1091-1104

[5] Resource and Deadline-aware Job Scheduling in Dynamic Hadoop Clusters [J].

Cheng, Dazhao ;

Rao, Jia ;

Jiang, Changjun ;

Zhou, Xiaobo .

2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2015, :956-965

[6] Access control for adaptive reservations on multi-user systems [J].

Cucinotta, Tommaso .

PROCEEDINGS OF THE 14TH IEEE REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM, 2008, :387-396

[7] SECapacity: A Secure Capacity Scheduler in YARN [J].

Dong, Chuntao ;

Shen, Qingni ;

Cheng, Lijing ;

Yang, Yahui ;

Wu, Zhonghai .

INFORMATION AND COMMUNICATIONS SECURITY, ICICS 2016, 2016, 9977 :184-194

[8]

Gantz J., 2012, IDC REPORT

[9]

Gao W., 2013, P 40 INT S COMP ARCH, P1307

[10] Performance evaluation of job schedulers on Hadoop YARN [J].

Lin, Jia-Chun ;

Lee, Ming-Chang .

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2016, 28 (09) :2711-2728

← 1 2 →