Hadoop Distributed File System for Big data analysis

被引:2
作者
Almansouri, Hatim Talal [1 ]
Masmoudi, Youssef [1 ]
机构
[1] Saudi Elect Univ, Riyadh, Saudi Arabia
来源
PROCEEDINGS OF 2019 IEEE 4TH WORLD CONFERENCE ON COMPLEX SYSTEMS (WCCS' 19) | 2019年
关键词
Hadoop; MapReduce; HDFS; DataNode; NameNode; Big Data Analysis;
D O I
10.1109/icocs.2019.8930804
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Hadoop is framework that is processing data with large volume that cannot be processed by conventional systems. Hadoop has management file system called Hadoop Distributed File System (HDFS) that has NameNode and DataNode where the data is divided into blocks based on the total size of dataset. In addition, Hadoop has MapReduce where the dataset is processed in Mapping phase and then reducing phase. Using Hadoop for big data analysis has been revealed important information that can be used for analytical purpose and enabling new products. Big data could be found in many different resources such as social networks, web server logs, broadcast audio streams and banking transactions. In this paper, we illustrated the main steps to setup Hadoop and MapReduce. The illustrated version in this work is the latest released of Hadoop 3.1.1 for big data analysis. A simplified pseudo code is provided to show the functionality of Map class and reduce class. The developed steps are applied with a given example that could be generalized with bigger data.
引用
收藏
页码:257 / 261
页数:5
相关论文
共 50 条
[41]   A Distributed and Cooperative NameNode Cluster for a Highly-Available Hadoop Distributed File System [J].
Kim, Yonghwan ;
Araragi, Tadashi ;
Nakamura, Junya ;
Masuzawa, Toshimitsu .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2015, E98D (04) :835-851
[42]   SD-HDFS: Secure Deletion in Hadoop Distributed File System [J].
Agrawal, Bikash ;
Hansen, Raymond ;
Rong, Chunming ;
Wiktorski, Tomasz .
2016 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2016, 2016, :181-189
[43]   Dynamic Preclusion of Encroachment in Hadoop Distributed File System [J].
Saranya, S. ;
Sarumathi, M. ;
Swathi, B. ;
Paul, P. Victer ;
Kumar, S. Sampath ;
Vengattaraman, T. .
BIG DATA, CLOUD AND COMPUTING CHALLENGES, 2015, 50 :531-536
[44]   Azure Data Lake Store: A Hyperscale Distributed File Service for Big Data Analytics [J].
Ramakrishnan, Raghu ;
Sridharan, Baskar ;
Douceur, John R. ;
Kasturi, Pavan ;
Krishnamachari-Sampath, Balaji ;
Krishnamoorthy, Karthick ;
Li, Peng ;
Manu, Mitica ;
Michaylov, Spiro ;
Ramos, Rogerio ;
Sharman, Neil ;
Xu, Zee ;
Barakat, Youssef ;
Douglas, Chris ;
Draves, Richard ;
Naidu, Shrikant S. ;
Shastry, Shankar ;
Sikaria, Atul ;
Sun, Simon ;
Venkatesan, Ramarathnam .
SIGMOD'17: PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2017, :51-63
[45]   Big Data representation for Grade Analysis Through Hadoop Framework [J].
Verma, Chitresh ;
Pandey, Rajiv .
2016 6TH INTERNATIONAL CONFERENCE - CLOUD SYSTEM AND BIG DATA ENGINEERING (CONFLUENCE), 2016, :312-315
[46]   Private Search Over Big Data Leveraging Distributed File System and Parallel Processing [J].
Selcuk, Ayse ;
Orencik, Cengiz ;
Savas, Erkay .
CLOUD COMPUTING 2015: THE SIXTH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, GRIDS, AND VIRTUALIZATION, 2015, :116-121
[47]   An Efficient Block Assignment Policy in Hadoop Distributed File System for Multimedia Data Processing [J].
Kim, Cheolgi ;
Lee, Daechul ;
Lee, Jaehyun ;
Lee, Jaehwan .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (08) :1569-1571
[48]   Sandbox security model for Hadoop file system [J].
Begum, Gousiya ;
Ul Huq, S. Zahoor ;
Kumar, A. P. Siva .
JOURNAL OF BIG DATA, 2020, 7 (01)
[49]   Sandbox security model for Hadoop file system [J].
S. Zahoor Ul Gousiya Begum ;
A. P. Siva Huq .
Journal of Big Data, 7
[50]   Performance Study on Indexing and Accessing of Small File in Hadoop Distributed File System [J].
Rodrigues, Anisha P. ;
Fernandes, Roshan ;
Vijaya, P. ;
Chander, Satish .
JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2021, 20 (04)