Job scheduling and data replication on data grids

被引:76
作者
Chang, Ruay-Shiung [1 ]
Chang, Jih-Sheng [1 ]
Lin, Shin-Yi [1 ]
机构
[1] Natl Dong Hwa Univ, Dept Comp Sci & Informat Engn, Shoufeng 974, Hualien, Taiwan
来源
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2007年 / 23卷 / 07期
关键词
data replication; data grid; job scheduling;
D O I
10.1016/j.future.2007.02.008
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In data grids, many distributed scientific and engineering applications often require access to a large amount of data (terabytes or petabytes). Data access time depends on bandwidth, especially in a cluster grid. Network bandwidth within the same cluster is larger than across clusters. In a communication environment, the major bottleneck to supporting fast data access in Grids is the high latencies of Wide Area Networks (WANs) and Internet. Effective scheduling in such network architecture can reduce the amount of data transferred across the Internet by dispatching a job to where the needed data are present. Another solution is to use a data replication mechanism to generate multiple copies of the existing data to reduce access opportunities from a remote site. To utilize the above two concepts, in this paper we develop a job scheduling policy, called HCS (Hierarchical Cluster Scheduling), and a dynamic data replication strategy, called HRS (Hierarchical Replication Strategy), to improve the data access efficiencies in a cluster grid. We simulate our algorithm to evaluate various combinations of data access patterns. We also implement HCS and HRS in the Taiwan Unigrid environment. The simulation and experiment results show that HCS and HRS successfully reduces data access time and the amount of inter-cluster-communications in comparison with other strategies in a cluster grid. (C) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:846 / 860
页数:15
相关论文
共 24 条
  • [21] Dynamic replication algorithms for the multi-tier Data Grid
    Tang, M
    Lee, BS
    Yeo, CK
    Tang, XY
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2005, 21 (05): : 775 - 790
  • [22] Tannenbaum T., 2001, Beowulf cluster computing .
  • [23] The network weather service: a distributed resource performance forecasting service for metacomputing
    Wolski, R
    Spring, NT
    Hayes, J
    [J]. FUTURE GENERATION COMPUTER SYSTEMS, 1999, 15 (5-6) : 757 - 768
  • [24] A scalable P2P platform for the knowledge grid
    Zhuge, H
    Sun, XP
    Liu, J
    Yao, EL
    Chen, X
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (12) : 1721 - 1736