OPTIMIZING HADOOP DATA LOCALITY: PERFORMANCE ENHANCEMENT STRATEGIES IN HETEROGENEOUS COMPUTING ENVIRONMENTS

被引:0
|
作者
Kim, Si-Yeong [1 ]
Kim, Tai-Hoon [1 ]
机构
[1] Chonnam Natl Univ, Sch Elect & Comp Engn, Yeosu Campus, Gwangju 59626, South Korea
来源
SCALABLE COMPUTING-PRACTICE AND EXPERIENCE | 2024年 / 25卷 / 06期
关键词
Hadoop; Data Locality; Performance Enhancement; Heterogeneous Computing; Distributed Computing; Big Data;
D O I
10.12694/scpe.v25i6.3294
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
As organizations increasingly harness big data for analytics and decision-making, the efficient processing of massive datasets becomes paramount. Hadoop, a widely adopted distributed computing framework, excels in processing large-scale data. However, its performance is contingent on effective data locality, which becomes challenging in heterogeneous computing environments comprising diverse hardware resources. This research addresses the imperative of enhancing Hadoop's data locality performance in heterogeneous computing environments. The study explores strategies to optimize data placement and task scheduling, considering the diverse characteristics of nodes within the infrastructure. Through a comprehensive analysis of Hadoop's data locality algorithms and their impact on performance, this work proposes novel approaches to mitigate challenges associated with disparate hardware capabilities. Weighted Extreme Learning Machine Technique (Weighted ELM) with the Firefly Algorithm (WELM-FF) is used in the proposed work. The integration of Weighted Extreme Learning Machine (WELM) with the Firefly Algorithm holds promise for enhancing machine learning models in the context of large-scale data processing. The research employs a combination of theoretical analysis and practical experiments to evaluate the effectiveness of the proposed enhancements. Factors such as network latency, disk I/O, and CPU capabilities are taken into account to develop a holistic framework for improving data locality and, consequently, overall Hadoop performance. The findings presented in this study contribute valuable insights to the field of distributed computing, offering practical recommendations for organizations seeking to maximize the efficiency of their Hadoop deployments in heterogeneous computing environments. By addressing the intricacies of data locality, this research strives to enhance the scalability and performance of Hadoop clusters, thereby facilitating more effective utilization of big data resources.
引用
收藏
页码:4558 / 4575
页数:18
相关论文
共 27 条
  • [1] Novel data-placement scheme for improving the data locality of Hadoop in heterogeneous environments
    Bae, Minho
    Yeo, Sangho
    Park, Gyudong
    Oh, Sangyoon
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (18):
  • [2] Performance Evaluation of Distributed Computing Environments with Hadoop and Spark Frameworks
    Taran, Vladyslav
    Alienin, Oleg
    Stirenko, Sergii
    Gordienko, Yuri
    Rojbi, A.
    2017 IEEE INTERNATIONAL YOUNG SCIENTISTS FORUM ON APPLIED PHYSICS AND ENGINEERING (YSF), 2017, : 80 - 83
  • [3] Hadoop Performance Analysis Model with Deep Data Locality
    Lee, Sungchul
    Jo, Ju-Yeon
    Kim, Yoohwan
    INFORMATION, 2019, 10 (07)
  • [4] Novel Data-Distribution Technique for Hadoop in Heterogeneous Cloud Environments
    Ubarhande, Vrushali
    Popescu, Alina-Madalina
    Gonzalez-Velez, Horacio
    2015 9TH INTERNATIONAL CONFERENCE ON COMPLEX, INTELLIGENT, AND SOFTWARE INTENSIVE SYSTEMS CISIS 2015, 2015, : 217 - 224
  • [5] An enhancement of data locality in Hadoop distributed file system
    Reddy, A. Siva Krishna
    Sujatha, Pothula
    Koti, Prasad
    Dhavachelvan, P.
    Amudhavel, J.
    BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS, 2018, 11 (01): : 123 - 133
  • [6] A data locality based scheduler to enhance MapReduce performance in heterogeneous environments
    Naik, Nenavath Srinivas
    Negi, Atul
    Bapu, Tapas B. R.
    Anitha, R.
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 90 : 423 - 434
  • [7] Scheduling of Big Data Workflows in the Hadoop Framework with Heterogeneous Computing Cluster
    Rahmani, Amir Masoud
    Chamzini, Ehsan Yazdani
    Pourshaban, Mohsen
    Hosseinzadeh, Mehdi
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2024,
  • [8] A Dynamic Data Placement Strategy for Hadoop in Heterogeneous Environments
    Lee, Chia-Wei
    Hsieh, Kuang-Yu
    Hsieh, Sun-Yuan
    Hsiao, Hung-Chang
    BIG DATA RESEARCH, 2014, 1 : 14 - 22
  • [9] Optimizing Data Locality by Executor Allocation in Spark Computing Environment
    Fu, Zhongming
    He, Mengsi
    Tang, Zhuo
    Zhang, Yang
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2023, 20 (01) : 491 - 512
  • [10] Data locality-aware and QoS-aware dynamic cloud workflow scheduling in Hadoop for heterogeneous environment
    Ding, Fan
    Ma, Minjin
    INTERNATIONAL JOURNAL OF WEB AND GRID SERVICES, 2023, 19 (01) : 113 - 135