High Performance Hadoop Distributed File System

被引:12
作者
Elkawkagy, Mohamed [1 ]
Elbeh, Heba [1 ]
机构
[1] Menoufia Univ, Comp Sci Dept, Shibin Al Kawm, Menoufia, Egypt
关键词
Cloud; HDFS; fault tolerance; reliability;
D O I
10.2991/ijndc.k.200515.007
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Although by the end of 2020, most of companies will be running 1000 node Hadoop in the system, the Hadoop implementation is still accompanied by many challenges like security, fault tolerance, flexibility. Hadoop is a software paradigm that handles big data, and it has a distributed file systems so-called Hadoop Distributed File System (HDFS). HDFS has the ability to handle fault tolerance using data replication technique. It works by repeating the data in multiple DataNodes which means the reliability and availability are achieved. Although data replications technique works well, but still waste much more time because it uses single pipelined paradigm. The proposed approach improves the performance of HDFS by using multiple pipelines in transferring data blocks instead of single pipeline. In addition, each DataNode will update its reliability value after each round and send this updated data to the NameNode. The NameNode will sort the DataNodes according to the reliability value. When the client submits request to upload data block, the NameNode will reply by a list of high reliability DataNodes that will achieve high performance. The proposed approach is fully implemented and the experimental results show that it improves the performance of HDFS write operations.
引用
收藏
页码:119 / 123
页数:5
相关论文
共 18 条
[1]   DARE: Adaptive Data Replication for Efficient Cluster Scheduling [J].
Abad, Cristina L. ;
Lu, Yi ;
Campbell, Roy H. .
2011 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2011, :159-168
[2]  
AbdElfattah E, 2017, 2017 13TH INTERNATIONAL COMPUTER ENGINEERING CONFERENCE (ICENCO), P190, DOI 10.1109/ICENCO.2017.8289786
[3]   Target Tracking with Limited Sensing Range in Autonomous Mobile Sensor Networks [J].
Bai, Jing ;
Cheng, Peng ;
Chen, Jiming ;
Guenard, Adrien ;
Song, Yeqiong .
2012 IEEE 8TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING IN SENSOR SYSTEMS (DCOSS), 2012, :329-334
[4]  
Fan B., 2009, DiskReduce: RAID for data-intensive scalable computing, Proceedings of the 4th Annual Workshop on Petascale Data Storage, P6
[5]  
Ghemawat S., 2003, Operating Systems Review, V37, P29, DOI 10.1145/1165389.945450
[6]  
Hasan M.I., 2014, INT J COMPUT APPL, V86, P254
[7]  
Howard JH., 1988, An overview of the Andrew file system, Proceedings of the USENIX Winter Technical Conference, Dallas TX, P23
[8]  
Kashkouli A, 2017, INT J COMPUT SCI NET, V17, P81
[9]  
Lin W., 2011, 2011 3 PACIFIC ASIA, P1
[10]   Distributed filesystem forensics: XtreemFS as a case study [J].
Martini, Ben ;
Choo, Kim-Kwang Raymond .
DIGITAL INVESTIGATION, 2014, 11 (04) :295-313