Performance Evaluation of Distributed Computing Environments with Hadoop and Spark Frameworks

被引：0

作者：

Taran, Vladyslav ^{[1
]}

Alienin, Oleg ^{[1
]}

Stirenko, Sergii ^{[1
]}

Gordienko, Yuri ^{[1
]}

Rojbi, A. ^{[2
]}

机构：

[1] Natl Tech Univ Ukraine, Igor Sikorsky Kyiv Polytech Inst, Kiev, Ukraine

[2] Univ Paris 08, CHArt Lab, Human & Artificial Cognit, 2 Rue Liberte, F-93526 St Denis, France

来源：

2017 IEEE INTERNATIONAL YOUNG SCIENTISTS FORUM ON APPLIED PHYSICS AND ENGINEERING (YSF) | 2017年

关键词：

information systems; Big Data; distributed computing; clusters; Hadoop; Spark; speedup; machine learning; multimodal interactions; data image processing and recognition;

D O I：

暂无

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

Recently, due to rapid development of information and communication technologies, the data are created and consumed in the avalanche way. Distributed computing create preconditions for analyzing and processing such Big Data by distributing the computations among a number of compute nodes. In this work, performance of distributed computing environments on the basis of Hadoop and Spark frameworks is estimated for real and virtual versions of clusters. As a test task, we chose the classic use case of word counting in texts of various sizes. It was found that the running times grow very fast with the dataset size and faster than a power function even. As to the real and virtual versions of cluster implementations, this tendency is the similar for both Hadoop and Spark frameworks. Moreover, speedup values decrease significantly with the growth of dataset size, especially for virtual version of cluster configuration. The problem of growing data generated by IoT and multimodal (visual, sound, tactile, neuro and brain-computing, muscle and eye tracking, etc.) interaction channels is presented. In the context of this problem, the current observations as to the running times and speedup on Hadoop and Spark frameworks in real and virtual cluster configurations can be very useful for the proper scaling-up and efficient job management, especially for machine learning and Deep Learning applications, where Big Data are widely present.

引用

页码：80 / 83

页数：4

共 50 条

[21] Performance Comparison of Apache Hadoop and Apache Spark
Singh, Amritpal
Khamparia, Aditya
Luhach, Ashish Kr
PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON ADVANCED INFORMATICS FOR COMPUTING RESEARCH (ICAICR '19), 2019,
[22] Evaluation of distributed data processing frameworks in hybrid clouds
Ullah, Faheem
Dhingra, Shagun
Xia, Xiaoyu
Babar, M. Ali
JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2024, 224
[23] Performance Evaluation of Read and Write Operations in Hadoop Distributed File System
Krishna, T. Lakshmi Siva Rama
Ragunathan, T.
Battula, Sudheer Kumar
2014 SIXTH INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS AND PROGRAMMING (PAAP), 2014, : 110 - 113
[24] Performance Evaluation of Hadoop Tools Using Word Count Algorithm
Benlachmi, Yassine
Elyazidi, Abdelaziz
Hasnaoui, Moulay Lahcen
ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT (AI2SD'2020), VOL 1, 2022, 1417 : 875 - +
[25] Performance evaluation of K-means clustering on Hadoop infrastructure
Vats, Satvik
Sagar, B. B.
JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY, 2019, 22 (08): : 1349 - 1363
[26] Application Traffic Classification in Hadoop Distributed Computing Environment
Shim, Kyu-Seok
Lee, Su-Kang
Kim, Myung-Sup
2014 16TH ASIA-PACIFIC NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM (APNOMS), 2014,
[27] Performance Evaluation of Hadoop Tools Using Word Count Algorithm
Benlachmi, Yassine
Elyazidi, Abdelaziz
Hasnaoui, Moulay Lahcen
ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT (AI2SD'2020), VOL 2, 2022, 1418 : 875 - 887
[28] Migrating GIS Big Data Computing from Hadoop to Spark: An Exemplary Study Using Twitter
Sun, Zhibo
Zhang, Hong
Liu, Zixia
Xu, Chen
Wang, Liqiang
PROCEEDINGS OF 2016 IEEE 9TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD), 2016, : 351 - 358
[29] Hadoop-based Distributed Computing Algorithms for Healthcare and Clinic Data Processing
Ni, Jun
Chen, Ying
Sha, Jie
Zhang, Minghuan
2015 EIGHTH INTERNATIONAL CONFERENCE ON INTERNET COMPUTING FOR SCIENCE AND ENGINEERING (ICICSE), 2015, : 188 - 193
[30] A comprehensive performance analysis of Apache Hadoop and Apache Spark for large scale data sets using HiBench
N. Ahmed
Andre L. C. Barczak
Teo Susnjak
Mohammed A. Rashid
Journal of Big Data, 7

← 1 2 3 4 5 →