Performance Evaluation of Distributed Computing Environments with Hadoop and Spark Frameworks

被引：0

作者：

Taran, Vladyslav ^{[1
]}

Alienin, Oleg ^{[1
]}

Stirenko, Sergii ^{[1
]}

Gordienko, Yuri ^{[1
]}

Rojbi, A. ^{[2
]}

机构：

[1] Natl Tech Univ Ukraine, Igor Sikorsky Kyiv Polytech Inst, Kiev, Ukraine

[2] Univ Paris 08, CHArt Lab, Human & Artificial Cognit, 2 Rue Liberte, F-93526 St Denis, France

来源：

2017 IEEE INTERNATIONAL YOUNG SCIENTISTS FORUM ON APPLIED PHYSICS AND ENGINEERING (YSF) | 2017年

关键词：

information systems; Big Data; distributed computing; clusters; Hadoop; Spark; speedup; machine learning; multimodal interactions; data image processing and recognition;

D O I：

暂无

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

Recently, due to rapid development of information and communication technologies, the data are created and consumed in the avalanche way. Distributed computing create preconditions for analyzing and processing such Big Data by distributing the computations among a number of compute nodes. In this work, performance of distributed computing environments on the basis of Hadoop and Spark frameworks is estimated for real and virtual versions of clusters. As a test task, we chose the classic use case of word counting in texts of various sizes. It was found that the running times grow very fast with the dataset size and faster than a power function even. As to the real and virtual versions of cluster implementations, this tendency is the similar for both Hadoop and Spark frameworks. Moreover, speedup values decrease significantly with the growth of dataset size, especially for virtual version of cluster configuration. The problem of growing data generated by IoT and multimodal (visual, sound, tactile, neuro and brain-computing, muscle and eye tracking, etc.) interaction channels is presented. In the context of this problem, the current observations as to the running times and speedup on Hadoop and Spark frameworks in real and virtual cluster configurations can be very useful for the proper scaling-up and efficient job management, especially for machine learning and Deep Learning applications, where Big Data are widely present.

引用

页码：80 / 83

页数：4

共 50 条

[1] Performance Analysis of Distributed Computing Frameworks for Big Data Analytics: Hadoop Vs Spark
Ketu, Shwet
Mishra, Pramod Kumar
Agarwal, Sonali
COMPUTACION Y SISTEMAS, 2020, 24 (02): : 669 - 686
[2] Performance comparison between Hadoop and Spark frameworks using HiBench benchmarks
Samadi, Yassir
Zbakh, Mostapha
Tadonki, Claude
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (12):
[3] A Comparative Analysis of Hadoop and Spark Frameworks using Word Count Algorithm
Benlaehmi, Yassine
El Yazidi, Abdelaziz
Hasnaoui, Moulay Lahcen
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (04) : 778 - 788
[4] Performance Comparision of Hadoop and Spark Engine
Hazarika, Akaash Vishal
Ram, G. Jagadeesh Sai Raghu
Jain, Eeti
2017 INTERNATIONAL CONFERENCE ON I-SMAC (IOT IN SOCIAL, MOBILE, ANALYTICS AND CLOUD) (I-SMAC), 2017, : 671 - 674
[5] Performance Evaluation and Tuning for MapReduce Computing in Hadoop Distributed File System
Kim, Jongyeop
Kumar, Ashwin T. K.
George, K. M.
Park, Nohpill
PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2015, : 62 - 68
[6] OPTIMIZING HADOOP DATA LOCALITY: PERFORMANCE ENHANCEMENT STRATEGIES IN HETEROGENEOUS COMPUTING ENVIRONMENTS
Kim, Si-Yeong
Kim, Tai-Hoon
SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2024, 25 (06): : 4558 - 4575
[7] Investigating the performance of Hadoop and Spark platforms on machine learning algorithms
Ali Mostafaeipour
Amir Jahangard Rafsanjani
Mohammad Ahmadi
Joshuva Arockia Dhanraj
The Journal of Supercomputing, 2021, 77 : 1273 - 1300
[8] Investigating the performance of Hadoop and Spark platforms on machine learning algorithms
Mostafaeipour, Ali
Rafsanjani, Amir Jahangard
Ahmadi, Mohammad
Dhanraj, Joshuva Arockia
JOURNAL OF SUPERCOMPUTING, 2021, 77 (02): : 1273 - 1300
[9] Performance Evaluation of Big Data Frameworks: MapReduce and Spark
Singh, Jaspreet
Panda, S. N.
Kaushal, Rajesh
INTELLIGENT COMMUNICATION, CONTROL AND DEVICES, ICICCD 2017, 2018, 624 : 1611 - 1619
[10] LOG ANALYSIS IN CLOUD COMPUTING ENVIRONMENT WITH HADOOP AND SPARK
Lin, Xiuqin
Wang, Peng
Wu, Bin
2013 5TH IEEE INTERNATIONAL CONFERENCE ON BROADBAND NETWORK & MULTIMEDIA TECHNOLOGY (IC-BNMT), 2013, : 273 - 276

← 1 2 3 4 5 →