Hadoop Distributed File System for Big data analysis

被引:2
作者
Almansouri, Hatim Talal [1 ]
Masmoudi, Youssef [1 ]
机构
[1] Saudi Elect Univ, Riyadh, Saudi Arabia
来源
PROCEEDINGS OF 2019 IEEE 4TH WORLD CONFERENCE ON COMPLEX SYSTEMS (WCCS' 19) | 2019年
关键词
Hadoop; MapReduce; HDFS; DataNode; NameNode; Big Data Analysis;
D O I
10.1109/icocs.2019.8930804
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Hadoop is framework that is processing data with large volume that cannot be processed by conventional systems. Hadoop has management file system called Hadoop Distributed File System (HDFS) that has NameNode and DataNode where the data is divided into blocks based on the total size of dataset. In addition, Hadoop has MapReduce where the dataset is processed in Mapping phase and then reducing phase. Using Hadoop for big data analysis has been revealed important information that can be used for analytical purpose and enabling new products. Big data could be found in many different resources such as social networks, web server logs, broadcast audio streams and banking transactions. In this paper, we illustrated the main steps to setup Hadoop and MapReduce. The illustrated version in this work is the latest released of Hadoop 3.1.1 for big data analysis. A simplified pseudo code is provided to show the functionality of Map class and reduce class. The developed steps are applied with a given example that could be generalized with bigger data.
引用
收藏
页码:257 / 261
页数:5
相关论文
共 50 条
[21]   Research on Distributed File System with Hadoop [J].
Xu, JunWu ;
Liang, JunLing .
NETWORK COMPUTING AND INFORMATION SECURITY, 2012, 345 :148-+
[22]   Data-Intensive Workload Consolidation for the Hadoop Distributed File System [J].
Moraveji, Reza ;
Taheri, Javid ;
Reza, Mohammad ;
Rizvandi, Nikzad Babaii ;
Zomaya, Albert Y. .
2012 ACM/IEEE 13TH INTERNATIONAL CONFERENCE ON GRID COMPUTING (GRID), 2012, :95-103
[23]   Big Data Analysis using Apache Hadoop [J].
Manikandan, Shankar Ganesh ;
Ravi, Siddarth .
2014 INTERNATIONAL CONFERENCE ON IT CONVERGENCE AND SECURITY (ICITCS), 2014,
[24]   Analysis of Big Data Platform with OpenStack and Hadoop [J].
Li, Xiaoyan ;
Lu, Zhihui ;
Wang, Nini ;
Wu, Jie ;
Huang, Shalin .
ADVANCES IN SERVICES COMPUTING, 2016, 10065 :375-390
[25]   HDFSX: Big Data Distributed File System with Small Files Support [J].
EIKafrawy, Passent M. ;
Sauber, Amr M. ;
Hafez, Mohamed M. .
ICENCO 2016 - 2016 12TH INTERNATIONAL COMPUTER ENGINEERING CONFERENCE (ICENCO) - BOUNDLESS SMART SOCIETIES, 2016, :131-135
[26]   High Performance Hadoop Distributed File System [J].
Elkawkagy, Mohamed ;
Elbeh, Heba .
INTERNATIONAL JOURNAL OF NETWORKED AND DISTRIBUTED COMPUTING, 2020, 8 (03) :119-123
[27]   Research on reliability of hadoop distributed file system [J].
Hu, Daming ;
Chen, Deyun ;
Lou, Shuhui ;
Pei, Shujun .
International Journal of Multimedia and Ubiquitous Engineering, 2015, 10 (11) :315-326
[28]   High Performance Hadoop Distributed File System [J].
Mohamed Elkawkagy ;
Heba Elbeh .
International Journal of Networked and Distributed Computing, 2020, 8 :119-123
[29]   Understanding the Performance of Erasure Codes in Hadoop Distributed File System [J].
Darrous, Jad ;
Ibrahim, Shadi .
PROCEEDINGS OF THE WORKSHOP ON CHALLENGES AND OPPORTUNITIES OF EFFICIENT AND PERFORMANT STORAGE SYSTEMS, CHEOPS 2022, 2022, :24-32
[30]   A New Replica Placement Policy for Hadoop Distributed File System [J].
Dai, Wei ;
Ibrahim, Ibrahim ;
Bassiouni, Mostafa .
2016 IEEE 2ND INTERNATIONAL CONFERENCE ON BIG DATA SECURITY ON CLOUD (BIGDATASECURITY), IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING (HPSC), AND IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT DATA AND SECURITY (IDS), 2016, :262-267