HybSMRP: a hybrid scheduling algorithm in Hadoop MapReduce framework

被引:15
作者
Gandomi, Abolfazl [1 ]
Reshadi, Midia [1 ]
Movaghar, Ali [2 ]
Khademzadeh, Ahmad [3 ]
机构
[1] Islamic Azad Univ, Dept Comp Engn, Sci & Res Branch, Tehran, Iran
[2] Sharif Univ Technol, Dept Comp Engn, Tehran, Iran
[3] Iran Telecommun Res Ctr, ITRC, Tehran, Iran
关键词
MapReduce; Scheduling; Hybrid algorithm; Data Locality; Dynamic priority; LOCALITY; PERFORMANCE;
D O I
10.1186/s40537-019-0253-9
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Due to the advent of new technologies, devices, and communication tools such as social networking sites, the amount of data produced by mankind is growing rapidly every year. Big data is a collection of large datasets that cannot be processed using traditional computing techniques. MapReduce has been introduced to solve large-data computational problems. It is specifically designed to run on commodity hardware, and it depends on dividing and conquering principles. Nowadays, the focus of researchers has shifted towards Hadoop MapReduce. One of the most outstanding characteristics of MapReduce is data locality-aware scheduling. Data locality-aware scheduler is a further efficient solution to optimize one or a set of performance metrics such as data locality, energy consumption and job completion time. Similar to all situations, time and scheduling are the most important aspects of the MapReduce framework. Therefore, many scheduling algorithms have been proposed in the past decades. The main ideas of these algorithms are increasing data locality rate and decreasing the response and completion time. In this paper, a new hybrid scheduling algorithm has been proposed, which uses dynamic priority and localization ID techniques and focuses on increasing data locality rate and decreasing completion time. The proposed algorithm was evaluated and compared with Hadoop default schedulers (FIFO, Fair), by running concurrent workloads consisting of Wordcount and Terasort benchmarks. The experimental results show that the proposed algorithm is faster than FIFO and Fair scheduling, achieves higher data locality rate and avoids wasting resources.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Apache Hadoop-MapReduce on YARN framework latency
    El Yazidi, Abdelaziz
    Azizi, Mohamed Saad
    Benlachmi, Yassine
    Hasnaoui, Moulay Lahcen
    [J]. 12TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT) / THE 4TH INTERNATIONAL CONFERENCE ON EMERGING DATA AND INDUSTRY 4.0 (EDI40) / AFFILIATED WORKSHOPS, 2021, 184 : 803 - 808
  • [32] HTD: heterogeneous throughput-driven task scheduling algorithm in MapReduce
    Wang, Xite
    Wang, Chaojin
    Bai, Mei
    Ma, Qian
    Li, Guanyu
    [J]. DISTRIBUTED AND PARALLEL DATABASES, 2022, 40 (01) : 135 - 163
  • [33] A new data mining algorithm based on MapReduce and hadoop
    Yang, Xianfeng
    Lian, Liming
    [J]. International Journal of Signal Processing, Image Processing and Pattern Recognition, 2014, 7 (02) : 131 - 142
  • [34] MapReduce Model of Improved K-Means Clustering Algorithm Using Hadoop MapReduce
    Akthar, Nadeem
    Ahamad, Mohd Vasim
    Ahmad, Shahbaaz
    [J]. 2016 SECOND INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE & COMMUNICATION TECHNOLOGY (CICT), 2016, : 192 - 198
  • [35] An optimized MapReduce workflow scheduling algorithm for heterogeneous computing
    Tang, Zhuo
    Liu, Min
    Ammar, Almoalmi
    Li, Kenli
    Li, Keqin
    [J]. JOURNAL OF SUPERCOMPUTING, 2016, 72 (06) : 2059 - 2079
  • [36] Hybrid Data Mining Algorithm in Cloud Computing using MapReduce Framework
    Sahay, Siddharth
    Khetarpal, Suruchi
    Pradhan, Tribikram
    [J]. PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION CONTROL AND COMPUTING TECHNOLOGIES (ICACCCT), 2016, : 507 - 511
  • [37] An Approach to Enhance the Performance of Hadoop MapReduce Framework for Big Data
    Chandra, Subhash
    Motwani, Deepak
    [J]. 2016 INTERNATIONAL CONFERENCE ON MICRO-ELECTRONICS AND TELECOMMUNICATION ENGINEERING (ICMETE), 2016, : 178 - 182
  • [38] An optimized MapReduce workflow scheduling algorithm for heterogeneous computing
    Zhuo Tang
    Min Liu
    Almoalmi Ammar
    Kenli Li
    Keqin Li
    [J]. The Journal of Supercomputing, 2016, 72 : 2059 - 2079
  • [39] Framework for Analyzing Web Access Logs using Hadoop and MapReduce
    Borgaonkar, Pranjali
    Kumar, Gaurav
    Yaduwanshi, Jyoti
    [J]. 2018 INTERNATIONAL CONFERENCE ON RECENT INNOVATIONS IN ELECTRICAL, ELECTRONICS & COMMUNICATION ENGINEERING (ICRIEECE 2018), 2018, : 2124 - 2129
  • [40] A Task-Based Greedy Scheduling Algorithm for Minimizing Energy of MapReduce Jobs
    Yousefi, Mostafa Hadadian Nejad
    Goudarzi, Maziar
    [J]. JOURNAL OF GRID COMPUTING, 2018, 16 (04) : 535 - 551