Piranha: Optimizing Short Jobs in Hadoop

被引:24
作者
Elmeleegy, Khaled [1 ]
机构
[1] Turn Inc, Redwood City, CA 84101 USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2013年 / 6卷 / 11期
关键词
D O I
10.14778/2536222.2536225
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cluster computing has emerged as a key parallel processing platform for large scale data. All major internet companies use it as their major central processing platform. One of cluster computing's most popular examples is MapReduce and its open source implementation Hadoop. These systems were originally designed for batch and massive-scale computations. Interestingly, over time their production workloads have evolved into a mix of a small fraction of large and long-running jobs and a much bigger fraction of short jobs. This came about because these systems end up being used as data warehouses, which store most of the data sets and attract ad hoc, short, data-mining queries. Moreover, the availability of higher level query languages that operate on top of these cluster systems proliferated these ad hoc queries. Since existing systems were not designed for short, latency-sensistive jobs, short interactive jobs suffer from poor response times. In this paper, we present Piranha-a system for optimizing short jobs on Hadoop without affecting the larger jobs. It runs on existing unmodified Hadoop clusters facilitating its adoption. Piranha exploits characteristics of short jobs learned from production workloads at Yahoo!(1) clusters to reduce the latency of such jobs. To demonstrate Piranha's effectiveness, we evaluated its performance using three realistic short queries. Piranha was able to reduce the queries' response times by up to 71%.
引用
收藏
页码:985 / 996
页数:12
相关论文
共 23 条
[1]  
Abouzied A., 2009, VLDB
[2]  
[Anonymous], [No title captured]
[3]  
Borkar V., 2011, ICDE 2011 IEEE INT C
[4]  
Chattopadhyay B, 2011, PROC VLDB ENDOW, V4, P1318
[5]   Cheetah: A High Performance, Custom Data Warehouse on Top of MapReduce [J].
Chen, Songting .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2010, 3 (02) :1459-1468
[6]  
Condie T, 2010, NSDI, P21
[7]  
Dittrich J., 2010, P VLDB ENDOW, V3
[8]  
Ghemawat S., 2003, P 19 ACM S OPERATING, P29, DOI [DOI 10.1145/1165389.945450, 10.1145/1165389.945450]
[9]  
Hindman B., 2011, NSDI 11 P 8 USENIX C
[10]  
Hunt P., 2010, P 2010 USENIX C USEN