An improved data placement strategy in a heterogeneous Hadoop cluster

被引:0
作者
Zhao, Wentao [1 ,2 ]
Meng, Lingjun [1 ]
Sun, Jiangfeng [1 ,2 ]
Ding, Yang [1 ]
Zhao, Haohao [1 ]
Wang, Lina [1 ,2 ]
机构
[1] School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo
[2] Opening Project of Key Laboratory of Mine Informatization, Henan Polytechnic University, Jiaozuo, 454000, Henan
关键词
Data placement; Disk space utilization; HDFS; Network load; Nodes heterogeneity;
D O I
10.2174/1874110X01408010957
中图分类号
学科分类号
摘要
Hadoop Distributed File System (HDFS) is designed to store big data reliably, and to stream these data at high bandwidth to user applications. However, the default HDFS block placement policy assumes that all nodes in the cluster are homogeneous, and randomly place blocks without considering any nodes’ resource characteristics, which decreases self-adaptability of the system. In this paper, we take account nodes heterogeneities, such as utilization of nodes’ disk space, and put forward an improved blocks placement strategy for solving some drawbacks in the default HDFS. The simulation experiments indicate that our improved strategy performs much better not only in the data distribution but also significantly saves more time than the default blocks placement. © Zhao et al.
引用
收藏
页码:957 / 963
页数:6
相关论文
共 50 条
[21]   IDP: An Innovative Data Placement Algorithm for Hadoop Systems [J].
Lee, Chia-Wei ;
Huang, Horng-Chyau ;
Hsieh, Sun-Yuan .
INTELLIGENT SYSTEMS AND APPLICATIONS (ICS 2014), 2015, 274 :49-58
[22]   Application and Storage-Aware Data Placement and Job Scheduling for Hadoop Clusters [J].
Li, Tao ;
He, Shuibing ;
Chen, Ping ;
Yang, Siling ;
Yin, Yanlong ;
Xu, Cheng .
JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2020, 29 (16)
[23]   A classification framework for straggler mitigation and management in a heterogeneous Hadoop cluster: A state-of-art survey [J].
Bawankule, Kamalakant Laxman ;
Dewang, Rupesh Kumar ;
Singh, Anil Kumar .
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (09) :7621-7644
[24]   Load Balancing Approach for a MapReduce Job Running on a Heterogeneous Hadoop Cluster [J].
Bawankule, Kamalakant Laxman ;
Dewang, Rupesh Kumar ;
Singh, Anil Kumar .
DISTRIBUTED COMPUTING AND INTERNET TECHNOLOGY, ICDCIT 2021, 2021, 12582 :289-298
[25]   Data Placement Strategy in Data Center [J].
Cao, Xiang ;
Scripps, Jerry ;
Trefftz, Christian ;
Kurmas, Zachary .
2019 IEEE INTERNATIONAL CONFERENCE ON ELECTRO INFORMATION TECHNOLOGY (EIT), 2019, :114-119
[26]   Enhanced Bond Energy Algorithm for Data Placement in Hadoop Framework [J].
Sridevi, S. ;
Reshma, J. G. ;
Pavithradevi, E. ;
Dhivya, S. ;
Uthariaraj, V. Rhymend .
2018 10TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (ICOAC), 2018, :208-215
[27]   Scheduling in Big Data Heterogeneous Distributed System Using Hadoop [J].
Thakkar, Shraddha ;
Patel, Sanjay .
PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ICT FOR SUSTAINABLE DEVELOPMENT ICT4SD 2015, VOL 2, 2016, 409 :119-131
[28]   A Cost-Efficient Data Placement Algorithm with High Reliability in Hadoop [J].
Du, Yao ;
Xiong, Runqun ;
Jin, Jiahui ;
Luo, Junzhou .
2017 FIFTH INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA (CBD), 2017, :100-105
[29]   A Novel Data Placement Strategy for Data-Sharing Scientific Workflows in Heterogeneous Edge-Cloud Computing Environments [J].
Du, Xin ;
Tang, Songtao ;
Lu, Zhihui ;
Wu, Jie ;
Gai, Keke ;
Hung, Patrick C. K. .
2020 IEEE 13TH INTERNATIONAL CONFERENCE ON WEB SERVICES (ICWS 2020), 2020, :498-507
[30]   Novel Data-Distribution Technique for Hadoop in Heterogeneous Cloud Environments [J].
Ubarhande, Vrushali ;
Popescu, Alina-Madalina ;
Gonzalez-Velez, Horacio .
2015 9TH INTERNATIONAL CONFERENCE ON COMPLEX, INTELLIGENT, AND SOFTWARE INTENSIVE SYSTEMS CISIS 2015, 2015, :217-224