An Improved data placement strategy in a heterogeneous hadoop cluster

被引:0
作者
Zhao, Wentao [1 ,2 ]
Meng, Lingjun [1 ]
Sun, Jiangfeng [1 ,2 ]
Ding, Yang [1 ]
Zhao, Haohao [1 ]
Wang, Lina [1 ,2 ]
机构
[1] School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo
[2] Opening Project of Key Laboratory of Mine Informatization, Henan Polytechnic University, Jiaozuo, 454000, Henan
来源
Open Cybernetics and Systemics Journal | 2015年 / 9卷 / 01期
关键词
Data placement; Disk space utilization; HDFS; Network load; Nodes heterogeneity;
D O I
10.2174/1874110X01509010792
中图分类号
学科分类号
摘要
Hadoop Distributed File System (HDFS) is designed to store big data reliably, and to stream these data at high bandwidth to user applications. However, the default HDFS block placement policy assumes that all nodes in the cluster are homogeneous, and randomly place blocks without considering any nodes’ resource characteristics, which decreases self-adaptability of the system. In this paper, we take account nodes heterogeneities, such as utilization of nodes’ disk space, and put forward an improved blocks placement strategy for solving some drawbacks in the default HDFS. The simulation experiments indicate that our improved strategy performs much better not only in the data distribution but also significantly saves more time than the default blocks placement. © Zhao et al.; Licensee Bentham Open.
引用
收藏
页码:792 / 798
页数:6
相关论文
共 14 条
[1]  
Lee C., Buyya R., Roe P., Future generation computer system, Future Generation Computer Systems, 18, pp. 599-616, (2002)
[2]  
Sanjay G., Howard G., Leung S.-T., The google file system, Operating Systems Review (ACM), 37, pp. 29-43, (2003)
[3]  
Dean J., Ghemawat S., MapReduce: Simplified Data Processing on Large Clusters, Proceedings of 6th Symposium on Operating System Design and Implementa-tion(OSDI), pp. 137-150, (2004)
[4]  
Patel N.M., Patel N.M., Hasan M.I., Shah P.D., Patel M.M., Improving HDFS write performance using efficient replica placement, Proceedings of the 5th International Conference on Confluence 2014, pp. 36-39, (2014)
[5]  
Apache Hadoop
[6]  
Ye X.L., Huang M.X., Zhu D.H., Xu P., A Novel Blocks Placement Strategy for Hadoop, 2012 IEEE/ACIS 11th International Conference on Computer and Information Science, pp. 3-7, (2012)
[7]  
Cheng Z.D., Luan Z.Z., Meng Y., Xu Y.J., Qian D.P., ERMS: An Elastic Replication Management System for HDFS, 2012 IEEE International Conference on Cluster Computing Workshops, pp. 32-40, (2012)
[8]  
Wei Q.S., Veeravalli B., Gong B.Z., Zeng L.F., Feng D., CDRM: A Cost-effective Dynamic Replication Management Scheme for Cloud Storage Cluster, 2010 IEEE International Conference on Cluster Computing, pp. 188-196, (2010)
[9]  
Shao X.L., Wang Y.G., Li Y.L., Liu Y.W., Replication Placement Strategy of Hadoop, CAAI Transactions on Intelligent Systems, 8, pp. 489-496, (2013)
[10]  
Khan O., Burns R., Plank J., Pierce W., Huang C., Rethinking Erasure Codes for Cloud File Systems: Minimizing I/O for Recovery and Degraded Reads, Conference on File and Storage Technologies (FAST), pp. 1-14, (2012)