Adaptive Replication Management in HDFS Based on Supervised Learning

被引:30
作者
Bui, Dinh-Mao [1 ]
Hussain, Shujaat [1 ]
Huh, Eui-Nam [1 ]
Lee, Sungyoung [1 ]
机构
[1] Kyung Hee Univ, Dept Comp Engn, Suwon 446701, South Korea
基金
新加坡国家研究基金会;
关键词
Replication; HDFS; proactive prediction; optimization; Bayesian learning; Gaussian process; ERASURE CODES;
D O I
10.1109/TKDE.2016.2523510
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The number of applications based on Apache Hadoop is dramatically increasing due to the robustness and dynamic features of this system. At the heart of Apache Hadoop, the Hadoop Distributed File System (HDFS) provides the reliability and high availability for computation by applying a static replication by default. However, because of the characteristics of parallel operations on the application layer, the access rate for each data file in HDFS is completely different. Consequently, maintaining the same replication mechanism for every data file leads to detrimental effects on the performance. By rigorously considering the drawbacks of the HDFS replication, this paper proposes an approach to dynamically replicate the data file based on the predictive analysis. With the help of probability theory, the utilization of each data file can be predicted to create a corresponding replication strategy. Eventually, the popular files can be subsequently replicated according to their own access potentials. For the remaining low potential files, an erasure code is applied to maintain the reliability. Hence, our approach simultaneously improves the availability while keeping the reliability in comparison to the default scheme. Furthermore, the complexity reduction is applied to enhance the effectiveness of the prediction when dealing with Big Data.
引用
收藏
页码:1369 / 1382
页数:14
相关论文
共 38 条
[1]   DARE: Adaptive Data Replication for Efficient Cluster Scheduling [J].
Abad, Cristina L. ;
Lu, Yi ;
Campbell, Roy H. .
2011 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2011, :159-168
[2]  
Ananthanarayanan G, 2011, EUROSYS 11: PROCEEDINGS OF THE EUROSYS 2011 CONFERENCE, P287
[3]  
[Anonymous], CORR
[4]  
[Anonymous], EUSIPCO2005 13 EUR S
[5]  
[Anonymous], 2015, WHAT IS GANGLIA
[6]  
[Anonymous], 2015, WHAT IS APACHE HADOO
[7]  
[Anonymous], 2004, Advances in Neural Information Processing Systems (NIPS-17)
[8]  
[Anonymous], INT SERIES ASIAN STU
[9]  
[Anonymous], 1994, INTRO CONJUGATE GRAD
[10]  
[Anonymous], ADV NEURAL INF PROCE