Probabilistic Network-Aware Task Placement for MapReduce Scheduling

被引:15
|
作者
Shen, Haiying [1 ]
Sarker, Ankur [1 ]
Yu, Lei [2 ]
Deng, Feng [1 ]
机构
[1] Clemson Univ, Dept Elect & Comp Engn, Clemson, SC 29634 USA
[2] Georgia Inst Technol, Coll Comp, Atlanta, GA 30332 USA
来源
2016 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER) | 2016年
基金
美国国家科学基金会;
关键词
MapReduce; task scheduling; job scheduling; DATA LOCALITY;
D O I
10.1109/CLUSTER.2016.48
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Maximizing data locality in task scheduling is critical for the performance of MapReduce job execution. Many existing works on MapReduce scheduling decide the placement of map and reduce tasks on a coarse granularity of locations measured by located machines and racks. They do not explicitly consider the network topology and data transmission cost, which may cause task straggling and degrade the job performance. In order to improve MapReduce job performance, in this paper, we consider the task placement with the goal of minimizing the overall data transmission cost for a job execution while balancing the transmission cost reduction and resource utilization. We propose a probabilistic network-aware scheduling algorithm that selects a task (map task or reduce task) to be scheduled on a given available task slot that leads to the minimum transmission cost among the task candidates, and then schedule the selected task on the slot with a probability determined by its transmission cost; a lower expected transmission cost leads to a higher probability and vice versa. We also propose a method to more accurately estimate the intermediate data size based on the progress of map tasks, which is needed to calculate the transmission cost of reduce tasks but is unknown at the time of reduce task scheduling. We implement our probabilistic network-aware scheduling algorithm on Apache Hadoop and conduct experiments on a high-performance computing platform. The experimental results show that our scheduling algorithm outperforms the previous approaches in terms of job completion time and cluster resource utilization.
引用
收藏
页码:241 / 250
页数:10
相关论文
共 50 条
  • [31] Network-aware Service Function Chaining Placement in a Data Center
    Hsieh, Cheng-Husan
    Chang, Je-Wei
    Chen, Chien
    Lug, Ssu-Hsuan
    2016 18TH ASIA-PACIFIC NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM (APNOMS), 2016,
  • [32] Diktyo: Network-Aware Scheduling in Container-Based Clouds
    Santos, Jose
    Wang, Chen
    Wauters, Tim
    De Turck, Filip
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2023, 20 (04): : 4461 - 4477
  • [33] Towards Network-Aware Service Placement in Community Network Micro-Clouds
    Selimi, Mennan
    Vega, Davide
    Freitag, Felix
    Veiga, Luis
    EURO-PAR 2016: PARALLEL PROCESSING, 2016, 9833 : 376 - 388
  • [34] Network-aware worker placement for wide-area streaming analytics
    Mostafaei, Habib
    Afridi, Shafi
    Abawajy, Jemal
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2022, 136 : 270 - 281
  • [35] Dynamic Performance Aware Reduce Task Scheduling in MapReduce on Virtualized Environment
    Jeyaraj, Rathinaraja
    Ananthanarayana, V. S.
    2018 IEEE/ACIS 16TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING RESEARCH, MANAGEMENT AND APPLICATION (SERA), 2018, : 211 - 218
  • [36] Network-Aware Virtual Machine Placement in Cloud Data Centers: An Overview
    Harndi, Khaoula
    Kefi, Meriarn
    2016 INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS AND COMPUTER SYSTEMS (CIICS), 2016,
  • [37] Algorithms for Network-Aware Application Component Placement for Cloud Resource Allocation
    Barshan, Maryam
    Moens, Hendrik
    Latre, Steven
    Volckaert, Bruno
    De Turck, Filip
    JOURNAL OF COMMUNICATIONS AND NETWORKS, 2017, 19 (05) : 493 - 508
  • [38] Network-Aware Server Placement for Highly Interactive Distributed Virtual Environments
    Ta, Duong
    Zhou, Suiping
    Cai, Wentono
    Tang, Xueyan
    Ayani, Rassul
    DS-RT 2008: 12TH 2008 IEEE/ACM INTERNATIONAL SYMPOSIUM ON DISTRIBUTED SIMULATION AND REAL TIME APPLICATIONS, PROCEEDINGS, 2008, : 95 - +
  • [39] Network-Aware Container Placement in Cloud-Edge Kubernetes Clusters
    Marchese, Angelo
    Tomarchio, Orazio
    2022 22ND IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2022), 2022, : 859 - 865
  • [40] Network-Aware Locality Scheduling for Distributed Data Operators in Data Centers
    Cheng, Long
    Wang, Ying
    Liu, Qingzhi
    Epema, Dick H. J.
    Liu, Cheng
    Mao, Ying
    Murphy, John
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (06) : 1494 - 1510