Improving big data analytics data processing speed through map reduce scheduling and replica placement with HDFS using genetic optimization techniques

被引:0
作者
Sundara Kumar M.R. [1 ]
Mohan H.S. [2 ]
机构
[1] Department of ISE, New Horizon College of Engineering, Bangalore, Visvesvaraya Technological University, Belagavi
[2] Department of CSE (Data Science), RNS Institute of Technology, Bangalore, Visvesvaraya Technological University, Belagavi
关键词
Big data analytics; genetic algorithm-based optimization; hadoop distributed file system; map reduce scheduling; map reduce scheduling based non-dominated sorting genetic algorithm; non-dominated sorting genetic algorithm;
D O I
10.3233/JIFS-240069
中图分类号
学科分类号
摘要
Big Data Analytics (BDA) is an unavoidable technique in today's digital world for dealing with massive amounts of digital data generated by online and internet sources. It is kept in repositories for data processing via cluster nodes that are distributed throughout the wider network. Because of its magnitude and real-time creation, big data processing faces challenges with latency and throughput. Modern systems such as Hadoop and SPARK manage large amounts of data with their HDFS, Map Reduce, and In-Memory analytics approaches, but the migration cost is higher than usual. With Genetic Algorithm-based Optimization (GABO), Map Reduce Scheduling (MRS) and Data Replication have provided answers to this challenge. With multi objective solutions provided by Genetic Algorithm, resource utilization and node availability improve processing performance in large data environments. This work develops a novel creative strategy for enhancing data processing performance in big data analytics called Map Reduce Scheduling Based Non-Dominated Sorting Genetic Algorithm (MRSNSGA). The Hadoop-Map Reduce paradigm handles the placement of data in distributed blocks as a chunk and their scheduling among the cluster nodes in a wider network. Best fit solutions with high latency and low accessing time are extracted from the findings of various objective solutions. Experiments were carried out as a simulation with several inputs of varied location node data and cluster racks. Finally, the results show that the speed of data processing in big data analytics was enhanced by 30-35% over previous methodologies. Optimization approaches developed to locate the best solutions from multi-objective solutions at a rate of 24-30% among cluster nodes. © 2024 - IOS Press. All rights reserved.
引用
收藏
页码:10863 / 10882
页数:19
相关论文
共 48 条
[1]  
Liu J., Xie M., Chen S., Xu G., Wu T., Li W., TS-REPLICA: A Novel Replica Placement Algorithm Based on the Entropy Weight TOPSIS Method in Spark for Multimedia Data Analysis, Information Sciences, (2023)
[2]  
Ding F., Ma M., Data locality-aware and QoS-aware dynamic cloud workflow scheduling in Hadoop for heterogeneous environment, International Journal of Web and Grid Services, 19, 1, pp. 113-135, (2023)
[3]  
Aarthee S., Prabakaran R., Energy-Aware Heuristic Scheduling Using Bin Packing MapReduce Scheduler for Heterogeneous Workloads Performance in Big Data, Arabian Journal for Science and Engineering, 48, 2, pp. 1891-1905, (2023)
[4]  
Zhong Z., Deng Y., Li J., A Heat-Recirculation-Aware Data Placement Strategy towards Data Centers, 2022 IEEE 28th International Conference on Parallel and Distributed Systems (ICPADS) (2023), pp. 578-585
[5]  
Jagadish Kumar N., Balasubramanian C., Hybrid Gradient Descent Golden Eagle Optimization (HGDGEO) Algorithm-Based Efficient Heterogeneous Resource Scheduling for Big Data Processing on Clouds, Wireless Personal Communications, pp. 1-21, (2023)
[6]  
Fazul R.W.A., Barcelos P.P., PRBP: A prioritized replica balancing policy for HDFS balancer, Software: Practice and Experience, 53, 3, pp. 600-630, (2023)
[7]  
Rajput D., Goyal A., Tripathi A., Priority-based replication management for hadoop distributed file system, Congress on Intelligent Systems: Proceedings of CIS 2021, 2, pp. 549-560, (2022)
[8]  
Bawankule K.L., Dewang R.K., Singh A.K., A classification framework for straggler mitigation and management in a heterogeneous Hadoop cluster: A state-of-art survey, Journal of King Saud University-Computer and Information Sciences, (2022)
[9]  
Ajibade L.S., Bakar K.A., Aliyu A., Danish T., Straggler Mitigation in Hadoop MapReduce Framework: A Review, International Journal of Advanced Computer Science and Applications, 13, 8, (2022)
[10]  
Thakkar H.K., A Workload-Aware Data Placement Scheme for Hadoop-Enabled MapReduce Cloud Data Center, Predictive Analytics in Cloud, Fog, and Edge Computing: Perspectives and Practices of Blockchain, IoT, and 5G, pp. 185-197, (2022)