A Comparative Analysis of Hadoop and Spark Frameworks using Word Count Algorithm

被引:0
|
作者
Benlaehmi, Yassine [1 ]
El Yazidi, Abdelaziz [1 ]
Hasnaoui, Moulay Lahcen [1 ]
机构
[1] ENSAM Moulay Ismail Univ, LMMI Lab, Meknes 50000, Morocco
关键词
Big data; hadoop; spark; machine learning; Hadoop Distributed File System (HDFS)); mapreduce; word count; BIG DATA; PLACEMENT STRATEGY; ANALYTICS; IMPACT;
D O I
10.14569/IJACSA.2021.0120495
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the advent of the Big Data explosion due to the Information Technology (IT) revolution during the last few decades, the need for processing and analyzing the data at low cost in minimum time has become immensely challenging. The field of Big Data analytics is driven by the demand to process Machine Learning (ML) data, real-time streaming data, and graphics processing. The most efficient solutions to Big Data analysis in a distributed environment are Hadoop and Spark administered by Apache, both these solutions are open-source data management frameworks and they allow to distribute and compute the large datasets across multiple clusters of computing nodes. This paper provides a comprehensive comparison between Apache Hadoop & Apache Spark in terms of efficiency, scalability, security, cost-effectiveness, and other parameters. It describes primary components of Hadoop and Spark frameworks to compare their performance. The major conclusion is that Spark is better in terms of scalability and speed for real-time streaming applications; whereas, Hadoop is more viable for applications dealing with bigger datasets. This case study evaluates the performance of various components of Hadoop-such, MapReduce, and Hadoop Distributed File System (HDFS) by applying it to the well-known Word Count algorithm to ascertain its efficacy in terms of storage and computational time. Subsequently, it also provides an analysis of how Spark's in-line memory processing could reduce the computational time of the Word Count Algorithm.
引用
收藏
页码:778 / 788
页数:11
相关论文
共 50 条
  • [1] Performance Evaluation of Hadoop Tools Using Word Count Algorithm
    Benlachmi, Yassine
    Elyazidi, Abdelaziz
    Hasnaoui, Moulay Lahcen
    ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT (AI2SD'2020), VOL 2, 2022, 1418 : 875 - 887
  • [2] Performance Evaluation of Hadoop Tools Using Word Count Algorithm
    Benlachmi, Yassine
    Elyazidi, Abdelaziz
    Hasnaoui, Moulay Lahcen
    ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT (AI2SD'2020), VOL 1, 2022, 1417 : 875 - +
  • [3] Performance Evaluation of Distributed Computing Environments with Hadoop and Spark Frameworks
    Taran, Vladyslav
    Alienin, Oleg
    Stirenko, Sergii
    Gordienko, Yuri
    Rojbi, A.
    2017 IEEE INTERNATIONAL YOUNG SCIENTISTS FORUM ON APPLIED PHYSICS AND ENGINEERING (YSF), 2017, : 80 - 83
  • [4] Performance comparison between Hadoop and Spark frameworks using HiBench benchmarks
    Samadi, Yassir
    Zbakh, Mostapha
    Tadonki, Claude
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (12):
  • [5] Comparative Analysis of Apache Spark and Hadoop MapReduce Using Various Parameters and Execution Time
    Meena, Bhagavathula
    Sarwani, I. S. L.
    Archana, M.
    Supriya, P.
    INTELLIGENT COMPUTING AND COMMUNICATION, ICICC 2019, 2020, 1034 : 719 - 725
  • [6] Comparative study between Hadoop and Spark based on Hibench benchmarks
    Samadi, Yassir
    Zbakh, Mostapha
    Tadonki, Claude
    2016 2ND INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGIES AND APPLICATIONS (CLOUDTECH), 2016, : 267 - 275
  • [7] Performance Evaluation of Word Count Program Using C#, Java']Java and Hadoop
    Yadav, Ravinder
    Kilaru, Aravind
    Srivastava, Devesh Kumar
    Dahiya, Priyanka
    SMART TRENDS IN INFORMATION TECHNOLOGY AND COMPUTER COMMUNICATIONS, SMARTCOM 2016, 2016, 628 : 299 - 307
  • [8] Comparative analysis of Hadoop tools and Spark technology
    Wakde, Aniket
    Shende, Purvesh
    Waydande, Sudarshan
    Uttarwar, Shravani
    Deshmukh, Ganesh
    2018 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA), 2018,
  • [9] Performance Analysis of Distributed Computing Frameworks for Big Data Analytics: Hadoop Vs Spark
    Ketu, Shwet
    Mishra, Pramod Kumar
    Agarwal, Sonali
    COMPUTACION Y SISTEMAS, 2020, 24 (02): : 669 - 686
  • [10] Present and importance of the implementation of Big Data using the Hadoop and Spark tools
    Montoya Suarez, Lina
    Gil Restrepo, Gustavo Andres
    REVISTA DIGITAL LAMPSAKOS, 2018, (19): : 67 - 72