Improving MapReduce scheduler for heterogeneous workloads in a heterogeneous environment

被引:9
|
作者
Jeyaraj, Rathinaraja [1 ]
Ananthanarayana, V. S. [1 ]
Paul, Anand [2 ]
机构
[1] Natl Inst Technol Karnataka, Dept IT, Mangalore, Karnataka, India
[2] Kyungpook Natl Univ, Sch Comp Sci & Engn, 80 Daehakro, Daegu 702701, South Korea
来源
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE | 2020年 / 32卷 / 07期
基金
新加坡国家研究基金会;
关键词
bin packing; heterogeneous workloads; jobs; map; reduce task placement; DATA PLACEMENT; BIG DATA; PERFORMANCE;
D O I
10.1002/cpe.5558
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Big data is largely influencing business entities and research sectors to be more data-driven. Hadoop MapReduce is one of the cost-effective ways to process large scale datasets and offered as a service over the Internet. Even though cloud service providers promise an infinite amount of resources available on-demand, it is inevitable that some of the hired virtual resources for MapReduce are left unutilized and makespan is limited due to various heterogeneities that exist while offering MapReduce as a service. As MapReduce v2 allows users to define the size of containers for the map and reduce tasks, jobs in a batch become heterogeneous and behave differently. Also, the different capacity of virtual machines in the MapReduce virtual cluster accommodate a varying number of map/reduce tasks. These factors highly affect resource utilization in the virtual cluster and the makespan for a batch of MapReduce jobs. Default MapReduce job schedulers do not consider these heterogeneities that exist in a cloud environment. Moreover, virtual machines in MapReduce virtual cluster process an equal number of blocks regardless of their capacity, which affects the makespan. Therefore, we devised a heuristic-based MapReduce job scheduler that exploits virtual machine and MapReduce workload level heterogeneities to improve resource utilization and makespan. We proposed two methods to achieve this: (i) roulette wheel scheme based data block placement in heterogeneous virtual machines, and (ii) a constrained 2-dimensional bin packing to place heterogeneous map/reduce tasks. We compared heuristic-based MapReduce job scheduler against the classical fair scheduler in MapReduce v2. Experimental results showed that our proposed scheduler improved makespan and resource utilization by 45.6% and 47.9% over classical fair scheduler.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] A data locality based scheduler to enhance MapReduce performance in heterogeneous environments
    Naik, Nenavath Srinivas
    Negi, Atul
    Bapu, Tapas B. R.
    Anitha, R.
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 90 : 423 - 434
  • [22] Performance Prediction Model in Heterogeneous MapReduce Environment
    Fan, Yuanquan
    Wu, Weiguo
    Xu, Yunlong
    Cao, Yangjie
    Li, Qian
    Cui, Jinhua
    Duan, Zhangfeng
    2014 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (CIT), 2014, : 240 - 245
  • [23] Analysis of hadoop MapReduce scheduling in heterogeneous environment
    Kalia, Khushboo
    Gupta, Neeraj
    AIN SHAMS ENGINEERING JOURNAL, 2021, 12 (01) : 1101 - 1110
  • [24] Insight and Reduction of MapReduce Stragglers in Heterogeneous Environment
    Zhao, Xia
    Kang, Kai
    Sun, YuZhong
    Song, Yin
    Xu, Minhao
    Pan, Tao
    2013 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2013,
  • [25] A Hardware-based HEFT Scheduler Implementation for Dynamic Workloads on Heterogeneous SoCs
    Fusco, Alexander
    Hassan, Sahil
    Mack, Joshua
    Akoglu, Ali
    PROCEEDINGS OF THE 2022 IFIP/IEEE 30TH INTERNATIONAL CONFERENCE ON VERY LARGE SCALE INTEGRATION (VLSI-SOC), 2022,
  • [26] Improving Performance by Matching Imbalanced Workloads with Heterogeneous Platforms
    Shen, Jie
    Varbanescu, Ana Lucia
    Zou, Peng
    Lu, Yutong
    Sips, Henk
    PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, (ICS'14), 2014, : 241 - 250
  • [27] Design Dynamic Data Allocation Scheduler to Improve MapReduce Performance in Heterogeneous Clouds
    Yang, Shin-Jer
    Chen, Yi-Ru
    Hsieh, Yung-Ming
    2012 NINTH IEEE INTERNATIONAL CONFERENCE ON E-BUSINESS ENGINEERING (ICEBE), 2012, : 265 - 270
  • [28] Design adaptive task allocation scheduler to improve MapReduce performance in heterogeneous clouds
    Yang, Shin-Jer
    Chen, Yi-Ru
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2015, 57 : 61 - 70
  • [29] Enhancing the Performance of MapReduce Default Scheduler by Detecting Prolonged TaskTrackers in Heterogeneous Environments
    Naik, Nenavath Srinivas
    Negi, Atul
    Sastry, V. N.
    PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION TECHNOLOGIES, IC3T 2015, VOL 2, 2016, 380 : 225 - 233
  • [30] Improving MapReduce Performance in a Heterogeneous Cloud: A Measurement Study
    Zhao, Xu
    Liu, Ling
    Zhang, Qi
    Dong, Xiaoshe
    2014 IEEE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD), 2014, : 401 - 408