Probabilistic Network-Aware Task Placement for MapReduce Scheduling

被引:15
|
作者
Shen, Haiying [1 ]
Sarker, Ankur [1 ]
Yu, Lei [2 ]
Deng, Feng [1 ]
机构
[1] Clemson Univ, Dept Elect & Comp Engn, Clemson, SC 29634 USA
[2] Georgia Inst Technol, Coll Comp, Atlanta, GA 30332 USA
来源
2016 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER) | 2016年
基金
美国国家科学基金会;
关键词
MapReduce; task scheduling; job scheduling; DATA LOCALITY;
D O I
10.1109/CLUSTER.2016.48
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Maximizing data locality in task scheduling is critical for the performance of MapReduce job execution. Many existing works on MapReduce scheduling decide the placement of map and reduce tasks on a coarse granularity of locations measured by located machines and racks. They do not explicitly consider the network topology and data transmission cost, which may cause task straggling and degrade the job performance. In order to improve MapReduce job performance, in this paper, we consider the task placement with the goal of minimizing the overall data transmission cost for a job execution while balancing the transmission cost reduction and resource utilization. We propose a probabilistic network-aware scheduling algorithm that selects a task (map task or reduce task) to be scheduled on a given available task slot that leads to the minimum transmission cost among the task candidates, and then schedule the selected task on the slot with a probability determined by its transmission cost; a lower expected transmission cost leads to a higher probability and vice versa. We also propose a method to more accurately estimate the intermediate data size based on the progress of map tasks, which is needed to calculate the transmission cost of reduce tasks but is unknown at the time of reduce task scheduling. We implement our probabilistic network-aware scheduling algorithm on Apache Hadoop and conduct experiments on a high-performance computing platform. The experimental results show that our scheduling algorithm outperforms the previous approaches in terms of job completion time and cluster resource utilization.
引用
收藏
页码:241 / 250
页数:10
相关论文
共 50 条
  • [1] Network-Aware Task Assignment for MapReduce Applications in Shared Clusters
    Xu, Fei
    Liu, Fangming
    Yin, Peng
    Jin, Hai
    JOURNAL OF INTERNET TECHNOLOGY, 2015, 16 (02): : 325 - 333
  • [2] Firebird: Network-aware Task Scheduling for Spark Using SDNs
    He, Xin
    Shenoy, Prashant
    2016 25TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS (ICCCN), 2016,
  • [3] INT Based Network-Aware Task Scheduling for Edge Computing
    Shreshta, Bibek
    Cziva, Richard
    Arslan, Engin
    2021 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2021, : 879 - 886
  • [4] Network Scheduling Aware Task Placement in Datacenters
    Munir, Ali
    He, Ting
    Raghavendra, Ramya
    Le, Franck
    Liu, Alex X.
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON EMERGING NETWORKING EXPERIMENTS AND TECHNOLOGIES (CONEXT'16), 2016, : 221 - 235
  • [5] SmartJoin: a network-aware multiway join for MapReduce
    Slagter, Kenn
    Hsu, Ching-Hsien
    Chung, Yeh-Ching
    Yi, Gangman
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2014, 17 (03): : 629 - 641
  • [6] Availability/Network-aware MapReduce over the Internet
    Tang, Bing
    Tang, Mingdong
    Fedak, Gilles
    He, Haiwu
    INFORMATION SCIENCES, 2017, 379 : 94 - 111
  • [7] SmartJoin: a network-aware multiway join for MapReduce
    Kenn Slagter
    Ching-Hsien Hsu
    Yeh-Ching Chung
    Gangman Yi
    Cluster Computing, 2014, 17 : 629 - 641
  • [8] Network-aware Grid scheduling
    Caminero, Agustin
    Caminero, Blanca
    Carrion, Carmen
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS 2007: OTM 2007 WORKSHOPS, PT 1, PROCEEDINGS, 2007, 4805 : 33 - +
  • [9] Symbiosis: Network-Aware Task Scheduling in Data-Parallel Frameworks
    Jiang, Jingjie
    Ma, Shiyao
    Li, Bo
    Li, Baochun
    IEEE INFOCOM 2016 - THE 35TH ANNUAL IEEE INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS, 2016,
  • [10] Phurti: Application and Network-Aware Flow Scheduling for Multi-Tenant MapReduce Clusters
    Cai, Chris X.
    Saeed, Shayan
    Gupta, Indranil
    Campbell, Roy H.
    Le, Franck
    PROCEEDINGS 2016 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING (IC2E), 2016, : 161 - 170