GEODIS: towards the optimization of data locality-aware job scheduling in geo-distributed data centers

被引:0
|
作者
Moïse W. Convolbo
Jerry Chou
Ching-Hsien Hsu
Yeh Ching Chung
机构
[1] National Tsing Hua University,School of Mathematics and Big Data
[2] Foshan University,undefined
[3] Chung Hua University,undefined
来源
Computing | 2018年 / 100卷
关键词
Geo-distributed; Data center; Scheduling; Data locality; Batch jobs; Big data analysis; 90C05 Linear programming; 90C27 Combinatorial optimization; 90C46 Optimality conditions, duality;
D O I
暂无
中图分类号
学科分类号
摘要
Today, data-intensive applications rely on geographically distributed systems to leverage data collection, storing and processing. Data locality has been seen as a prominent technique to improve application performance and reduce the impact of network latency by scheduling jobs directly in the nodes hosting the data to be processed. MapReduce and Dryad are examples of frameworks which exploit locality by splitting jobs into multiple tasks that are dispatched to process portions of data locally. However, as the ecosystem of big data analysis has shifted from single clusters to span geo-distributed data centers, it is unavoidable that data may still be transferred through the network in order reduce the schedule length. Nevertheless, there is a lack of mechanism to efficiently blend data locality and inter-data center data transfer requirement in the existing scheduling techniques to address data-intensive processing across dispersed data centers. Therefore, the objective of this work is to propose and solve the makespan optimization problem for data-intensive job scheduling on geo-distributed data centers. To this end, we first formulate the task placement and the data access as a linear programming and use the GLPK solver to solve it. We then present a low complexity heuristic scheduling algorithm called GeoDis which allows data locality to cope with the data transfer requirement to achieve a greater performance on the makespan. The experiments with various realistic traces and synthetic generated workload show that GeoDis can reduce makespan of processing jobs by 44% as compared to the state-of-the-art algorithms and remain within 91%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$91\%$$\end{document} closer to the optimal solution by the LP solver.
引用
收藏
页码:21 / 46
页数:25
相关论文
共 50 条
  • [31] Data Locality-Aware Big Data Query Evaluation in Distributed Clouds
    Xia, Qiufen
    Liang, Weifa
    Xu, Zichuan
    COMPUTER JOURNAL, 2017, 60 (06): : 791 - 809
  • [32] Towards WAN-Aware Join Sampling over Geo-Distributed Data
    Kumar, Dhruv
    Wolfrath, Joel
    Chandra, Abhishek
    Sitaraman, Ramesh K.
    PROCEEDINGS OF THE 5TH INTERNATIONAL WORKSHOP ON EDGE SYSTEMS, ANALYTICS AND NETWORKING (EDGESYS'22), 2022, : 13 - 18
  • [33] Renewable Energy-Aware Big Data Analytics in Geo-Distributed Data Centers with Reinforcement Learning
    Xu, Chenhan
    Wang, Kun
    Li, Peng
    Xia, Rui
    Guo, Song
    Guo, Minyi
    IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2020, 7 (01): : 205 - 215
  • [34] Green Computing with Geo-Distributed Heterogeneous Data Centers
    Pasricha, Sudeep
    Hogade, Ninad
    Siegel, Howard Jay
    Maciejewski, Anthony A.
    2019 TENTH INTERNATIONAL GREEN AND SUSTAINABLE COMPUTING CONFERENCE (IGSC), 2019,
  • [35] Yugong: Geo-Distributed Data and Job Placement at Scale
    Huang, Yuzhen
    Shi, Yingjie
    Zhong, Zheng
    Feng, Yihui
    Cheng, James
    Li, Jiwei
    Fang, Haochuan
    Li, Chao
    Guan, Tao
    Zhou, Jingren
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2019, 12 (12): : 2155 - 2169
  • [36] Investigation of Network Traffic in Geo-Distributed Data Centers
    Koshiba, Yutaka
    Chen, Wuhui
    Yamada, Yuichi
    Tanaka, Takazumi
    Paik, Incheon
    2015 IEEE 7TH INTERNATIONAL CONFERENCE ON AWARENESS SCIENCE & TECHNOLOGY (ICAST), 2015, : 174 - 179
  • [37] Fast media caching for geo-distributed data centers
    Zhang, Wei
    Wen, Yonggang
    Liu, Fang
    Chen, Yiqiang
    Fan, Rui
    COMPUTER COMMUNICATIONS, 2018, 120 : 46 - 57
  • [38] Holistic Management of Sustainable Geo-Distributed Data Centers
    Abbasi, Zahra
    Gupta, Sandeep K. S.
    2015 IEEE 22ND INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2015, : 426 - 435
  • [39] Early warning disaster-aware service protection in geo-distributed data centers
    Ma, Lisheng
    Su, Wei
    Wu, Bin
    Yang, Bin
    Jiang, Xiaohong
    COMPUTER NETWORKS, 2020, 180 (180)
  • [40] Power and Cost-aware Virtual Machine Placement in Geo-distributed Data Centers
    Rawas, Soha
    Zekri, Ahmed
    El Zaart, Ali
    CLOSER: PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE, 2018, : 112 - 123