Performance Evaluation of Distributed Computing Environments with Hadoop and Spark Frameworks

被引:0
|
作者
Taran, Vladyslav [1 ]
Alienin, Oleg [1 ]
Stirenko, Sergii [1 ]
Gordienko, Yuri [1 ]
Rojbi, A. [2 ]
机构
[1] Natl Tech Univ Ukraine, Igor Sikorsky Kyiv Polytech Inst, Kiev, Ukraine
[2] Univ Paris 08, CHArt Lab, Human & Artificial Cognit, 2 Rue Liberte, F-93526 St Denis, France
来源
2017 IEEE INTERNATIONAL YOUNG SCIENTISTS FORUM ON APPLIED PHYSICS AND ENGINEERING (YSF) | 2017年
关键词
information systems; Big Data; distributed computing; clusters; Hadoop; Spark; speedup; machine learning; multimodal interactions; data image processing and recognition;
D O I
暂无
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Recently, due to rapid development of information and communication technologies, the data are created and consumed in the avalanche way. Distributed computing create preconditions for analyzing and processing such Big Data by distributing the computations among a number of compute nodes. In this work, performance of distributed computing environments on the basis of Hadoop and Spark frameworks is estimated for real and virtual versions of clusters. As a test task, we chose the classic use case of word counting in texts of various sizes. It was found that the running times grow very fast with the dataset size and faster than a power function even. As to the real and virtual versions of cluster implementations, this tendency is the similar for both Hadoop and Spark frameworks. Moreover, speedup values decrease significantly with the growth of dataset size, especially for virtual version of cluster configuration. The problem of growing data generated by IoT and multimodal (visual, sound, tactile, neuro and brain-computing, muscle and eye tracking, etc.) interaction channels is presented. In the context of this problem, the current observations as to the running times and speedup on Hadoop and Spark frameworks in real and virtual cluster configurations can be very useful for the proper scaling-up and efficient job management, especially for machine learning and Deep Learning applications, where Big Data are widely present.
引用
收藏
页码:80 / 83
页数:4
相关论文
共 50 条
  • [41] SLA-Based Scheduling of Spark Jobs in Hybrid Cloud Computing Environments
    Islam, Muhammed Tawfiqul
    Wu, Huaming
    Karunasekera, Shanika
    Buyya, Rajkumar
    IEEE TRANSACTIONS ON COMPUTERS, 2022, 71 (05) : 1117 - 1132
  • [42] Performance Comparison of Distributed Pattern Matching Algorithms on Hadoop MapReduce Framework
    Sona, C. P.
    Mulerikkal, Jaison Paul
    MOBILE NETWORKS AND MANAGEMENT (MONAMI 2017), 2018, 235 : 45 - 55
  • [43] Performance evaluation of fair and capacity scheduling in Hadoop YARN
    Sharma, Garima
    Ganpati, Anita
    2015 INTERNATIONAL CONFERENCE ON GREEN COMPUTING AND INTERNET OF THINGS (ICGCIOT), 2015, : 904 - 906
  • [44] Distributed Nonlinear Semiparametric Support Vector Machine for Big Data Applications on Spark Frameworks
    Diaz-Morales, Roberto
    Navia-Vazquez, Angel
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2020, 50 (11): : 4664 - 4675
  • [45] Performance Comparison of a Parallel Recommender Algorithm across three Hadoop-based Frameworks
    Diedhiou, Christina
    Carpenter, Bryan
    Shafi, Aamir
    Sarkar, Soumabha
    Esmeli, Ramazan
    Gadsdon, Ryan
    2018 30TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2018), 2018, : 380 - 387
  • [46] Spark versus Flink: Understanding Performance in Big Data Analytics Frameworks
    Marcu, Ovidiu-Cristian
    Costan, Alexandra
    Antoniu, Gabriel
    Perez-Hernandez, Maria S.
    2016 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2016, : 433 - 442
  • [47] PRACTICAL RESULTS USING APACHE HADOOP PLATFORM FOR DISTRIBUTED AND PARALLEL COMPUTING
    Toma, Cristian
    INTERNATIONAL CONFERENCE ON INFORMATICS IN ECONOMY, 2012, : 30 - 35
  • [48] A Parallel Sequential SBAS Processing Framework Based on Hadoop Distributed Computing
    Wu, Zhenning
    Lv, Xiaolei
    Yun, Ye
    Duan, Wei
    REMOTE SENSING, 2024, 16 (03)
  • [49] Experimental Evaluation of Memory Configurations of Hadoop in Docker Environments
    Wang, Xueyuan
    Lee, Brian
    Qiao, Yuansong
    2016 27TH IRISH SIGNALS AND SYSTEMS CONFERENCE (ISSC), 2016,
  • [50] Performance Insights of Convolutional Neural Networks Operating on Distributed Computing Platforms
    Preeti Chaudhary
    Satvik Vats
    Vikrant Sharma
    SN Computer Science, 6 (4)