Probabilistic Network-Aware Task Placement for MapReduce Scheduling

被引:15
|
作者
Shen, Haiying [1 ]
Sarker, Ankur [1 ]
Yu, Lei [2 ]
Deng, Feng [1 ]
机构
[1] Clemson Univ, Dept Elect & Comp Engn, Clemson, SC 29634 USA
[2] Georgia Inst Technol, Coll Comp, Atlanta, GA 30332 USA
来源
2016 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER) | 2016年
基金
美国国家科学基金会;
关键词
MapReduce; task scheduling; job scheduling; DATA LOCALITY;
D O I
10.1109/CLUSTER.2016.48
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Maximizing data locality in task scheduling is critical for the performance of MapReduce job execution. Many existing works on MapReduce scheduling decide the placement of map and reduce tasks on a coarse granularity of locations measured by located machines and racks. They do not explicitly consider the network topology and data transmission cost, which may cause task straggling and degrade the job performance. In order to improve MapReduce job performance, in this paper, we consider the task placement with the goal of minimizing the overall data transmission cost for a job execution while balancing the transmission cost reduction and resource utilization. We propose a probabilistic network-aware scheduling algorithm that selects a task (map task or reduce task) to be scheduled on a given available task slot that leads to the minimum transmission cost among the task candidates, and then schedule the selected task on the slot with a probability determined by its transmission cost; a lower expected transmission cost leads to a higher probability and vice versa. We also propose a method to more accurately estimate the intermediate data size based on the progress of map tasks, which is needed to calculate the transmission cost of reduce tasks but is unknown at the time of reduce task scheduling. We implement our probabilistic network-aware scheduling algorithm on Apache Hadoop and conduct experiments on a high-performance computing platform. The experimental results show that our scheduling algorithm outperforms the previous approaches in terms of job completion time and cluster resource utilization.
引用
收藏
页码:241 / 250
页数:10
相关论文
共 50 条
  • [41] An Efficiency-Aware Scheduling for Data-Intensive Computations on MapReduce Clusters
    Zhao, Hui
    Yang, Shuqiang
    Fan, Hua
    Chen, Zhikun
    Xu, Jinghu
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (12): : 2654 - 2662
  • [42] An Enhanced Data-Locality-Aware Task Scheduling Algorithm for Hadoop Applications
    Choi, Dongjoo
    Jeon, Myunghoon
    Kim, Namgi
    Lee, Byoung-Dai
    IEEE SYSTEMS JOURNAL, 2018, 12 (04): : 3346 - 3357
  • [43] 2PTS: A Two-Phase Task Scheduling Algorithm for MapReduce
    Lim, Byungnam
    Shim, Yeeun
    Chung, Yon Dohn
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (09): : 2377 - 2380
  • [44] SPO: A Secure and Performance-aware Optimization for MapReduce Scheduling
    Maleki, Neda
    Rahmani, Amir Masoud
    Conti, Mauro
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2021, 176
  • [45] Deadline-aware MapReduce Scheduling with Selective Speculative Execution
    Kaur, Simranjit
    Saini, Poonam
    2017 8TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2017,
  • [46] MapReduce in the Cloud: Data-Location-Aware VM Scheduling
    Tung Nguyen
    Weisong Shi
    ZTECommunications, 2013, 11 (04) : 18 - 26
  • [47] Context-Aware Task Assignment for MapReduce in Heterogeneous Clouds
    Su, Wei-Tsung
    Pan, Wei-Fan
    Chen, Chao-Chun
    SENSORS AND MATERIALS, 2017, 29 (11) : 1497 - 1512
  • [48] LaSA: A Locality-aware Scheduling Algorithm for Hadoop-MapReduce Resource Assignment
    Chen, Tseng-Yi
    Wei, Hsin-Wen
    Wei, Ming-Feng
    Chen, Ying-Jie
    Hsu, Tsan-Sheng
    Shih, Wei-Kuan
    PROCEEDINGS OF THE 2013 INTERNATIONAL CONFERENCE ON COLLABORATION TECHNOLOGIES AND SYSTEMS (CTS), 2013, : 342 - 346
  • [49] Nap: Network-Aware Data Partitions for Efficient Distributed Processing
    Raz, Or
    Avin, Chen
    Schmid, Stefan
    2019 IEEE 18TH INTERNATIONAL SYMPOSIUM ON NETWORK COMPUTING AND APPLICATIONS (NCA), 2019, : 69 - 77
  • [50] Optimizing MapReduce Task Scheduling on Virtualized Heterogeneous Environments Using Ant Colony Optimization
    Jeyaraj, Rathinaraja
    Paul, Anand
    IEEE ACCESS, 2022, 10 : 55842 - 55855