Efficient Location-Aware Data Placement for Data-Intensive Applications in Geo-distributed Scientific Data Centers

被引:22
作者
Zhang, Jinghui [1 ]
Chen, Jian [1 ]
Luo, Junzhou [1 ]
Song, Aibo [1 ]
机构
[1] Southeast Univ, Sch Comp Sci & Engn, Nanjing 211189, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
data placement; geo-distributed; data center; Lagrangian relaxation; DEDICATED HETEROGENEOUS MULTICLUSTER; WORKFLOW;
D O I
10.1109/TST.2016.7590316
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent developments in cloud computing and big data have spurred the emergence of data-intensive applications for which massive scientific datasets are stored in globally distributed scientific data centers that have a high frequency of data access by scientists worldwide. Multiple associated data items distributed in different scientific data centers may be requested for one data processing task, and data placement decisions must respect the storage capacity limits of the scientific data centers. Therefore, the optimization of data access cost in the placement of data items in globally distributed scientific data centers has become an increasingly important goal. Existing data placement approaches for geo-distributed data items are insufficient because they either cannot cope with the cost incurred by the associated data access, or they overlook storage capacity limitations, which are a very practical constraint of scientific data centers. In this paper, inspired by applications in the field of high energy physics, we propose an integer-programming-based data placement model that addresses the above challenges as a Non-deterministic Polynomial-time (NP)-hard problem. In addition we use a Lagrangian relaxation based heuristics algorithm to obtain ideal data placement solutions. Our simulation results demonstrate that our algorithm is effective and significantly reduces overall data access cost.
引用
收藏
页码:471 / 481
页数:11
相关论文
共 16 条
[1]  
Agarwal S., 2010, NSDI, P17
[2]  
[Anonymous], 2014, P 26 INT C SCI STAT
[3]   Surviving Failures in Bandwidth-Constrained Datacenters [J].
Bodik, Peter ;
Menache, Ishai ;
Chowdhury, Mosharaf ;
Mani, Pradeepkumar ;
Maltz, David A. ;
Stoica, Ion .
ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2012, 42 (04) :431-442
[4]  
Boyang Yu, 2015, 2015 IEEE Conference on Computer Communications (INFOCOM). Proceedings, P603, DOI 10.1109/INFOCOM.2015.7218428
[5]  
Catalyurek U.V., 2011, P 4 INT WORKSHOP DAT, P45
[6]  
Fisher M. L., 2004, Management Science, V50, P1861, DOI 10.1287/mnsc.1040.0263
[7]   Optimizing Cost for Online Social Networks on Geo-Distributed Clouds [J].
Jiao, Lei ;
Li, Jun ;
Xu, Tianyin ;
Du, Wei ;
Fu, Xiaoming .
IEEE-ACM TRANSACTIONS ON NETWORKING, 2016, 24 (01) :99-112
[8]  
Jiao L, 2014, IEEE INFOCOM SER, P28, DOI 10.1109/INFOCOM.2014.6847921
[9]   SWORD: workload-aware data placement and replica selection for cloud data management systems [J].
Kumar, K. Ashwin ;
Quamar, Abdul ;
Deshpande, Amol ;
Khuller, Samir .
VLDB JOURNAL, 2014, 23 (06) :845-870
[10]   Bin packing with fragmentable items: Presentation and approximations [J].
LeCun, Bertrand ;
Mautor, Thierry ;
Quessette, Franck ;
Weisser, Marc-Antoine .
THEORETICAL COMPUTER SCIENCE, 2015, 602 :50-59