FlumeJava']Java: Easy, Efficient Data-Parallel Pipelines

被引:102
作者
Chambers, Craig
Raniwala, Ashish
Perry, Frances
Adams, Stephen
Henry, Robert R.
Bradshaw, Robert
Weizenbaum, Nathan
机构
来源
PLDI '10: PROCEEDINGS OF THE 2010 ACM SIGPLAN CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION | 2010年
关键词
data-parallel programming; MapReduce; !text type='Java']Java[!/text;
D O I
10.1145/1806596.1806638
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
MapReduce and similar systems significantly ease the task of writing data-parallel code. However, many real-world computations require a pipeline of Map Reduces, and programming and managing such pipelines can be difficult. We present Flume Java, a Java library that makes it easy to develop, test, and run efficient data-parallel pipelines. At the core of the Flume Java library are a couple of classes that represent immutable parallel collections, each supporting a modest number of operations for processing them in parallel. Parallel collections and their operations present a simple, high-level, uniform abstraction over different data representations and execution strategies. To enable parallel operations to run efficiently, Flume Java defers their evaluation, instead internally constructing an execution plan dataflow graph. When the final results of the parallel operations are eventually needed, Flume Java first optimizes the execution plan, and then executes the optimized operations on appropriate underlying primitives (e.g., Map Reduces). The combination of high-level abstractions for parallel data and computation, deferred evaluation and optimization, and efficient parallel primitives yields an easy-to-use system that approaches the efficiency of hand-optimized pipelines. Flume Java is in active use by hundreds of pipeline developers within Google.
引用
收藏
页码:363 / 375
页数:13
相关论文
共 18 条
[1]  
[Anonymous], SCI PROGRAMMING
[2]  
[Anonymous], 2008, COMMUNICATIONS ACM
[3]  
[Anonymous], PIG
[4]  
Chaiken R., 2008, P VLDB ENDOWMENT, V1
[5]  
Chang F., 2006, USENIX S OP SYST DES
[6]  
Dean J., 2004, USENIX S OP SYST DES
[7]  
DEAN J, 2006, PARALLEL ARCHITECTUR
[8]  
Ghemawat S., 2003, ACM S OP SYST PRINC
[9]  
HALSTEAD RH, 1989, WORKSH PAR LISP
[10]  
Isard M., 2007, EUROSYS