WOHA: Deadline-Aware Map-Reduce Workflow Scheduling Framework over Hadoop Clusters

被引:32
作者
Li, Shen [1 ]
Hu, Shaohan [1 ]
Wang, Shiguang [1 ]
Su, Lu [2 ]
Abdelzaher, Tarek [1 ]
Gupta, Indranil [1 ]
Pace, Richard [3 ]
机构
[1] Univ Illinois, Champaign, IL 61820 USA
[2] SUNY Buffalo, Buffalo, NY 14260 USA
[3] Yahoo Inc, Santa Clara, CA 94089 USA
来源
2014 IEEE 34TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2014) | 2014年
基金
美国国家科学基金会;
关键词
MapReduce; Hadoop; Workflow; Scheduling; Deadline;
D O I
10.1109/ICDCS.2014.18
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present WOHA, an efficient scheduling framework for deadline-aware Map-Reduce work-flows. In data centers, complex backend data analysis often utilizes a workflow that contains tens or even hundreds of interdependent Map-Reduce jobs. Meeting deadlines of these workflows is usually of crucial importance to businesses (for example, workflows tightly linked to time-sensitive advertisement placement optimizations can directly affect revenue). Popular Map-Reduce implementations, such as Hadoop, deal with independent Map-Reduce jobs rather than workflows of jobs. In order to simplify the process of submitting workflows, solutions like Oozie emerge, which take a workflow configuration file as input and automatically submit its Hadoop jobs at the right time. The information separation that Hadoop only handles resource allocation and Oozie workflow topology, although preventing the Hadoop master node from getting involved with complex workflow analysis, may unnecessarily lengthen the workflow spans and thus cause more deadline misses. To address this problem and at the same time honor the efficiency of Hadoop master node, WOHA allows client nodes to locally generate scheduling plans which are later used as resource allocation hints by the master node. Under this framework design, we propose a novel scheduling algorithm that improves deadline satisfaction ratio by dynamically assigning priorities among workflows based on their progresses. We implement WOHA by extending Hadoop-1.2.1. Our experiments over an 80-server cluster show that WOHA manages to increase the deadline satisfaction ratio by 10% compared to state-of-the-art solutions, and scales up to tens of thousands of concurrently running workflows.
引用
收藏
页码:93 / 103
页数:11
相关论文
共 27 条
[1]  
Agarwal S., 2012, USENIX NSDI
[2]  
[Anonymous], 2008, WHICH BIG DATA CO HA
[3]  
[Anonymous], 2013, YAHOO WEBSCOPE
[4]  
[Anonymous], 2013, UIUC GREEN SERV FARM
[5]  
Baruah S., 2012, IEEE RTSS
[6]  
Cho B., 2013, ACM SoCC
[7]   Mapreduce: Simplified data processing on large clusters [J].
Dean, Jeffrey ;
Ghemawat, Sanjay .
COMMUNICATIONS OF THE ACM, 2008, 51 (01) :107-113
[8]  
El-Rewini H., 1994, TASK SCHEDULING PARA
[9]  
Ghodsi A., 2011, USENIX NSDI
[10]  
Goiri I., 2012, ACM EUROSYS