A data placement strategy in scientific cloud workflows

被引:255
作者
Yuan, Dong [1 ]
Yang, Yun [1 ]
Liu, Xiao [1 ]
Chen, Jinjun [1 ]
机构
[1] Swinburne Univ Technol, Fac Informat & Commun Technol, Melbourne, Vic 3122, Australia
来源
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2010年 / 26卷 / 08期
基金
澳大利亚研究理事会;
关键词
Data management; Scientific workflow; Cloud computing;
D O I
10.1016/j.future.2010.02.004
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In scientific cloud workflows, large amounts of application data need to be stored in distributed data centres. To effectively store these data, a data manager must intelligently select data centres in which these data will reside. This is, however, not the case for data which must have a fixed location. When one task needs several datasets located in different data centres, the movement of large volumes of data becomes a challenge. In this paper, we propose a matrix based k-means clustering strategy for data placement in scientific cloud workflows. The strategy contains two algorithms that group the existing datasets in k data centres during the workflow build-time stage, and dynamically clusters newly generated datasets to the most appropriate data centres - based on dependencies - during the runtime stage. Simulations show that our algorithm can effectively reduce data movement during the workflow's execution. (c) 2010 Elsevier B.V. All rights reserved.
引用
收藏
页码:1200 / 1214
页数:15
相关论文
共 50 条
[1]  
[Anonymous], AM EL COMP CLOUD
[2]  
[Anonymous], 1 WORKSH CLOUD COMP
[3]  
[Anonymous], IBM CTR ADV STUD C
[4]  
[Anonymous], 2008, Philippine Rats: Ecology and Management, DOI DOI 10.1109/SC.2008.5217932
[5]  
[Anonymous], ACM IEEE C SUP SC 02
[6]  
[Anonymous], 8 GRID COMP C AUST T
[7]  
[Anonymous], ACM NETWORKER
[8]  
[Anonymous], 2007, WORKFLOWS E SCI
[9]  
[Anonymous], 4 IEEE INT C E SCI I
[10]  
[Anonymous], 10 IEEE INT C HIGH P