A Comparative Analysis of Hadoop and Spark Frameworks using Word Count Algorithm

被引：0

作者：

Benlaehmi, Yassine ^{[1
]}

El Yazidi, Abdelaziz ^{[1
]}

Hasnaoui, Moulay Lahcen ^{[1
]}

机构：

[1] ENSAM Moulay Ismail Univ, LMMI Lab, Meknes 50000, Morocco

来源：

INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS | 2021年 / 12卷 / 04期

关键词：

Big data; hadoop; spark; machine learning; Hadoop Distributed File System (HDFS)); mapreduce; word count; BIG DATA; PLACEMENT STRATEGY; ANALYTICS; IMPACT;

D O I：

10.14569/IJACSA.2021.0120495

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

With the advent of the Big Data explosion due to the Information Technology (IT) revolution during the last few decades, the need for processing and analyzing the data at low cost in minimum time has become immensely challenging. The field of Big Data analytics is driven by the demand to process Machine Learning (ML) data, real-time streaming data, and graphics processing. The most efficient solutions to Big Data analysis in a distributed environment are Hadoop and Spark administered by Apache, both these solutions are open-source data management frameworks and they allow to distribute and compute the large datasets across multiple clusters of computing nodes. This paper provides a comprehensive comparison between Apache Hadoop & Apache Spark in terms of efficiency, scalability, security, cost-effectiveness, and other parameters. It describes primary components of Hadoop and Spark frameworks to compare their performance. The major conclusion is that Spark is better in terms of scalability and speed for real-time streaming applications; whereas, Hadoop is more viable for applications dealing with bigger datasets. This case study evaluates the performance of various components of Hadoop-such, MapReduce, and Hadoop Distributed File System (HDFS) by applying it to the well-known Word Count algorithm to ascertain its efficacy in terms of storage and computational time. Subsequently, it also provides an analysis of how Spark's in-line memory processing could reduce the computational time of the Word Count Algorithm.

引用

页码：778 / 788

页数：11

共 50 条

[1] Performance Evaluation of Hadoop Tools Using Word Count Algorithm
Benlachmi, Yassine
Elyazidi, Abdelaziz
Hasnaoui, Moulay Lahcen
ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT (AI2SD'2020), VOL 2, 2022, 1418 : 875 - 887
[2] Performance Evaluation of Hadoop Tools Using Word Count Algorithm
Benlachmi, Yassine
Elyazidi, Abdelaziz
Hasnaoui, Moulay Lahcen
ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT (AI2SD'2020), VOL 1, 2022, 1417 : 875 - +
[3] Performance Evaluation of Distributed Computing Environments with Hadoop and Spark Frameworks
Taran, Vladyslav
Alienin, Oleg
Stirenko, Sergii
Gordienko, Yuri
Rojbi, A.
2017 IEEE INTERNATIONAL YOUNG SCIENTISTS FORUM ON APPLIED PHYSICS AND ENGINEERING (YSF), 2017, : 80 - 83
[4] Performance comparison between Hadoop and Spark frameworks using HiBench benchmarks
Samadi, Yassir
Zbakh, Mostapha
Tadonki, Claude
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (12):
[5] Comparative Analysis of Apache Spark and Hadoop MapReduce Using Various Parameters and Execution Time
Meena, Bhagavathula
Sarwani, I. S. L.
Archana, M.
Supriya, P.
INTELLIGENT COMPUTING AND COMMUNICATION, ICICC 2019, 2020, 1034 : 719 - 725
[6] Comparative study between Hadoop and Spark based on Hibench benchmarks
Samadi, Yassir
Zbakh, Mostapha
Tadonki, Claude
2016 2ND INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGIES AND APPLICATIONS (CLOUDTECH), 2016, : 267 - 275
[7] Performance Evaluation of Word Count Program Using C#, Java']Java and Hadoop
Yadav, Ravinder
Kilaru, Aravind
Srivastava, Devesh Kumar
Dahiya, Priyanka
SMART TRENDS IN INFORMATION TECHNOLOGY AND COMPUTER COMMUNICATIONS, SMARTCOM 2016, 2016, 628 : 299 - 307
[8] Comparative analysis of Hadoop tools and Spark technology
Wakde, Aniket
Shende, Purvesh
Waydande, Sudarshan
Uttarwar, Shravani
Deshmukh, Ganesh
2018 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA), 2018,
[9] Performance Analysis of Distributed Computing Frameworks for Big Data Analytics: Hadoop Vs Spark
Ketu, Shwet
Mishra, Pramod Kumar
Agarwal, Sonali
COMPUTACION Y SISTEMAS, 2020, 24 (02): : 669 - 686
[10] Present and importance of the implementation of Big Data using the Hadoop and Spark tools
Montoya Suarez, Lina
Gil Restrepo, Gustavo Andres
REVISTA DIGITAL LAMPSAKOS, 2018, (19): : 67 - 72

← 1 2 3 4 5 →