OPTIMIZING HADOOP DATA LOCALITY: PERFORMANCE ENHANCEMENT STRATEGIES IN HETEROGENEOUS COMPUTING ENVIRONMENTS

被引:0
作者
Kim, Si-Yeong [1 ]
Kim, Tai-Hoon [1 ]
机构
[1] Chonnam Natl Univ, Sch Elect & Comp Engn, Yeosu Campus, Gwangju 59626, South Korea
来源
SCALABLE COMPUTING-PRACTICE AND EXPERIENCE | 2024年 / 25卷 / 06期
关键词
Hadoop; Data Locality; Performance Enhancement; Heterogeneous Computing; Distributed Computing; Big Data;
D O I
10.12694/scpe.v25i6.3294
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
As organizations increasingly harness big data for analytics and decision-making, the efficient processing of massive datasets becomes paramount. Hadoop, a widely adopted distributed computing framework, excels in processing large-scale data. However, its performance is contingent on effective data locality, which becomes challenging in heterogeneous computing environments comprising diverse hardware resources. This research addresses the imperative of enhancing Hadoop's data locality performance in heterogeneous computing environments. The study explores strategies to optimize data placement and task scheduling, considering the diverse characteristics of nodes within the infrastructure. Through a comprehensive analysis of Hadoop's data locality algorithms and their impact on performance, this work proposes novel approaches to mitigate challenges associated with disparate hardware capabilities. Weighted Extreme Learning Machine Technique (Weighted ELM) with the Firefly Algorithm (WELM-FF) is used in the proposed work. The integration of Weighted Extreme Learning Machine (WELM) with the Firefly Algorithm holds promise for enhancing machine learning models in the context of large-scale data processing. The research employs a combination of theoretical analysis and practical experiments to evaluate the effectiveness of the proposed enhancements. Factors such as network latency, disk I/O, and CPU capabilities are taken into account to develop a holistic framework for improving data locality and, consequently, overall Hadoop performance. The findings presented in this study contribute valuable insights to the field of distributed computing, offering practical recommendations for organizations seeking to maximize the efficiency of their Hadoop deployments in heterogeneous computing environments. By addressing the intricacies of data locality, this research strives to enhance the scalability and performance of Hadoop clusters, thereby facilitating more effective utilization of big data resources.
引用
收藏
页码:4558 / 4575
页数:18
相关论文
共 17 条
[1]  
[Anonymous], 2015, 6 IEEE INT C 2015, P1010
[2]   SOM-ELM Self-Organized Clustering using ELM [J].
Miche, Yoan ;
Akusok, Anton ;
Veganzones, David ;
Bjork, Kaj-Mikael ;
Severin, Eric ;
du Jardin, Philippe ;
Termenon, Maite ;
Lendasse, Amaury .
NEUROCOMPUTING, 2015, 165 :238-254
[3]   Dynamic Management of Distributed Machine Learning Projects [J].
Oliveira, Filipe ;
Alves, Andre ;
Moco, Hugo ;
Monteiro, Jose ;
Oliveira, Oscar ;
Carneiro, Davide ;
Novais, Paulo .
INTELLIGENT DISTRIBUTED COMPUTING XV, IDC 2022, 2023, 1089 :23-32
[4]  
Sharma A, 2018, 2018 FIFTH INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (IEEE PDGC), P723, DOI 10.1109/PDGC.2018.8745928
[5]   New efficient Hadoop scheduler: Generalized particle swarm optimization and simulated annealing-dominant resource fairness [J].
Sharma, Sonia ;
Bharti, Rajendra Kumar .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2023, 35 (04)
[6]   An OS-ELM based distributed ensemble classification framework in P2P networks [J].
Sun, Yongjiao ;
Yuan, Ye ;
Wang, Guoren .
NEUROCOMPUTING, 2011, 74 (16) :2438-2443
[7]   ATPG for Reversible Circuits Using Simulation, Boolean Satisfiability, and Pseudo Boolean Optimization [J].
Wille, Robert ;
Zhang, Hongyan ;
Drechsler, Rolf .
2011 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI), 2011, :120-125
[8]   Blaze: A High performance Big Data Computing System for High Energy Physics [J].
Xia, Libin ;
Sun, Wei ;
Liu, Xiaoyu ;
Sun, Gongxing ;
Jiang, Xiaowei .
20TH INTERNATIONAL WORKSHOP ON ADVANCED COMPUTING AND ANALYSIS TECHNIQUES IN PHYSICS RESEARCH, 2023, 2438
[9]   A new modification approach on bat algorithm for solving optimization problems [J].
Yilmaz, Selim ;
Kucuksille, Ecir U. .
APPLIED SOFT COMPUTING, 2015, 28 :259-275
[10]   Combining multi-species genomic data for microRNA identification using a Naive Bayes classifier [J].
Yousef, Malik ;
Nebozhyn, Michael ;
Shatkay, Hagit ;
Kanterakis, Stathis ;
Showe, Louise C. ;
Showe, Michael K. .
BIOINFORMATICS, 2006, 22 (11) :1325-1334