Job scheduling and data replication on data grids

被引:76
作者
Chang, Ruay-Shiung [1 ]
Chang, Jih-Sheng [1 ]
Lin, Shin-Yi [1 ]
机构
[1] Natl Dong Hwa Univ, Dept Comp Sci & Informat Engn, Shoufeng 974, Hualien, Taiwan
来源
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2007年 / 23卷 / 07期
关键词
data replication; data grid; job scheduling;
D O I
10.1016/j.future.2007.02.008
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In data grids, many distributed scientific and engineering applications often require access to a large amount of data (terabytes or petabytes). Data access time depends on bandwidth, especially in a cluster grid. Network bandwidth within the same cluster is larger than across clusters. In a communication environment, the major bottleneck to supporting fast data access in Grids is the high latencies of Wide Area Networks (WANs) and Internet. Effective scheduling in such network architecture can reduce the amount of data transferred across the Internet by dispatching a job to where the needed data are present. Another solution is to use a data replication mechanism to generate multiple copies of the existing data to reduce access opportunities from a remote site. To utilize the above two concepts, in this paper we develop a job scheduling policy, called HCS (Hierarchical Cluster Scheduling), and a dynamic data replication strategy, called HRS (Hierarchical Replication Strategy), to improve the data access efficiencies in a cluster grid. We simulate our algorithm to evaluate various combinations of data access patterns. We also implement HCS and HRS in the Taiwan Unigrid environment. The simulation and experiment results show that HCS and HRS successfully reduces data access time and the amount of inter-cluster-communications in comparison with other strategies in a cluster grid. (C) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:846 / 860
页数:15
相关论文
共 24 条
  • [1] Bell WH, 2002, LECT NOTES COMPUT SC, V2536, P46
  • [2] Cameron DG, 2004, P COMP HIGH EN PHYS
  • [3] Carman M, 2002, CCGRID 2002: 2ND IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, PROCEEDINGS, P340, DOI 10.1109/CCGRID.2002.1017156
  • [4] Chakrabarti A, 2004, LECT NOTES COMPUT SC, V3296, P375
  • [5] The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets
    Chervenak, A
    Foster, I
    Kesselman, C
    Salisbury, C
    Tuecke, S
    [J]. JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2000, 23 (03) : 187 - 200
  • [6] *DAGMAN, 2003, COND PROJ
  • [7] The grid: A new infrastructure for 21st century science
    Foster, I
    [J]. PHYSICS TODAY, 2002, 55 (02) : 42 - 47
  • [8] Foster I., 2001, P INT C COMP HIGH EN
  • [9] Hoschek W, 2001, LECT NOTES COMPUT SC, V1971, P77
  • [10] Stork: Making data placement a first class citizen in the Grid
    Kosar, T
    Livny, M
    [J]. 24TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, PROCEEDINGS, 2004, : 342 - 349