SIDR: Structure-Aware Intelligent Data Routing in Hadoop

被引:1
作者
Buck, Joe [1 ]
Watkins, Noah [1 ]
Levin, Greg [1 ]
Crume, Adam [1 ]
Ioannidou, Kleoni [1 ]
Brandt, Scott [1 ]
Maltzahn, Carlos [1 ]
Polyzotis, Neoklis [1 ]
Torres, Aaron [2 ]
机构
[1] Univ Calif Santa Cruz, Dept Comp Sci, Santa Cruz, CA 95064 USA
[2] Los Alamos Natl Lab, Los Alamos, NM 87545 USA
来源
2013 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC) | 2013年
关键词
D O I
10.1145/2503210.2503241
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The MapReduce framework is being extended for domains quite different from the web applications for which it was designed, including the processing of big structured data, e.g., scientific and financial data. Previous work using MapReduce to process scientific data ignores existing structure when assigning intermediate data and scheduling tasks. In this paper, we present a method for incorporating knowledge of the structure of scientific data and executing query into the MapReduce communication model. Built in SciHadoop, a version of the Hadoop MapReduce framework for scientific data, SIDR intelligently partitions and routes intermediate data, allowing it to: remove Hadoop's global barrier and execute Reduce tasks prior to all Map tasks completing; minimize intermediate key skew; and produce early, correct results. SIDR executes queries up to 2.5 times faster than Hadoop and 37% faster than SciHadoop; produces initial results with only 6% of the query completed; and produces dense, contiguous output.
引用
收藏
页数:12
相关论文
共 32 条
[1]  
[Anonymous], 2010, NSDI
[2]  
[Anonymous], P 2010 INT C MAN DAT
[3]  
[Anonymous], P 23 ACM S PAR ALG A
[4]  
Baumann P., 1998, SIGMOD Record, V27, P575, DOI 10.1145/276305.276386
[5]  
Brown P. G., P 2010 INT C MAN DAT
[6]  
Buck J., 2012, UCSCSOE1208
[7]  
Buck JoeB., 2011, P 2011 INT C HIGH PE, p66:1
[8]  
Dean J, 2004, OSDI, P137
[9]  
Dede Elif., 2011, P 2011 ACM INT WORKS, P49, DOI [DOI 10.1145/2132876.2132888, 10.1145/2132876.2132888]
[10]  
Ekanayake Jaliya, 2009, Proceedings of the 2009 5th IEEE International Conference on e-Science (e-Science 2009), P329, DOI 10.1109/e-Science.2009.53