HPSO: Prefetching Based Scheduling to Improve Data Locality for MapReduce Clusters

被引:0
作者
Sun, Mingming [1 ]
Zhuang, Hang [1 ]
Zhou, Xuehai [1 ]
Lu, Kun [1 ]
Li, Changlong [1 ]
机构
[1] Univ Sci & Technol China, Hefei 230026, Peoples R China
来源
ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2014, PT II | 2014年 / 8631卷
关键词
Data locality; MapReduce clusters; prefetching; task scheduler;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Due to cluster resource competition and task scheduling policy, some map tasks are assigned to nodes without input data, which causes significant data access delay. Data locality is becoming one of the most critical factors to affect performance of MapReduce clusters. As machines in MapReduce clusters have large memory capacities, which are often underutilized, in-memory prefetching input data is an effective way to improve data locality. However, it is still posing serious challenges to cluster designers on what and when to prefetch. To effectively use prefetching, we have built HPSO (High Performance Scheduling Optimizer), a prefetching service based task scheduler to improve data locality for MapReduce jobs. The basic idea is to predict the most appropriate nodes to which future map tasks should be assigned and then preload the input data to memory without any delaying on launching new tasks. To this end, we have implemented HPSO in Hadoop-1.1.2. The experiment results have shown that the method can reduce the map tasks causing remote data delay, and improves the performance of Hadoop clusters.
引用
收藏
页码:82 / 95
页数:14
相关论文
共 20 条
[1]  
[Anonymous], 2011, Proceedings of the 2nd ACM Symposium on Cloud Computing
[2]  
[Anonymous], 2012, P USENIX C NETW SYST
[3]  
[Anonymous], 2009, Hadoop: The Definitive Guide
[4]  
[Anonymous], INT J GRID DISTRIBUT
[5]  
Byna Surendra, 2008, 2008 9th International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN '08), P19, DOI 10.1109/I-SPAN.2008.24
[6]   Tiled-MapReduce: Optimizing Resource Usages of Data-parallel Applications on Multicore with Tiling [J].
Chen, Rong ;
Chen, Haibo ;
Zang, Binyu .
PACT 2010: PROCEEDINGS OF THE NINETEENTH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2010, :523-534
[7]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
[8]   Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning [J].
Ganapathi, Archana ;
Kuno, Harumi ;
Dayal, Umeshwar ;
Wiener, Janet L. ;
Fox, Armando ;
Jordan, Michael ;
Patterson, David .
ICDE: 2009 IEEE 25TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2009, :592-+
[9]   REAL-TIME DATA PREFETCHING ALGORITHM BASED ON SEQUENTIAL PATTERNMINING IN CLOUD ENVIRONMENT [J].
Li, Jiazheng ;
Wu, Shaochun .
2012 INTERNATIONAL CONFERENCE ON INDUSTRIAL CONTROL AND ELECTRONICS ENGINEERING (ICICEE), 2012, :1044-1048
[10]  
Ranger C, 2007, INT S HIGH PERF COMP, P13