Efficient Data Placement and Replication for QoS-Aware Approximate Query Evaluation of Big Data Analytics

被引:18
作者
Xia, Qiufen [1 ]
Xu, Zichuan [2 ]
Liang, Weifa [3 ]
Yu, Shui [4 ]
Guo, Song [5 ]
Zomaya, Albert Y. [6 ]
机构
[1] Dalian Univ Technol, Key Lab Ubiquitous Network & Serv Software Liaoni, Int Sch Informat Sci & Engn, Dalian 116024, Liaoning, Peoples R China
[2] Dalian Univ Technol, Sch Software, Dalian 116024, Liaoning, Peoples R China
[3] Australian Natl Univ, Res Sch Comp Sci, Canberra, ACT 2601, Australia
[4] Univ Technol Sydney, Sch Software, Ultimo, NSW 2007, Australia
[5] Hong Kong Polytech Univ, Dept Comp, Hung Hom, Hong Kong, Peoples R China
[6] Univ Sydney, Sch Comp Sci, Camperdown, NSW 2006, Australia
基金
中国国家自然科学基金;
关键词
Big Data; Query processing; Delays; Approximation algorithms; Quality of service; Distributed databases; Software; Data replication and placement; big data analytics; approximate query evaluation; approximation algorithms; algorithm analysis; OPERATIONAL COST MINIMIZATION; DATA CENTERS;
D O I
10.1109/TPDS.2019.2921337
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Enterprise users at different geographic locations generate large-volume data that is stored at different geographic datacenters. These users may also perform big data analytics on the stored data to identify valuable information in order to make strategic decisions. However, it is well known that performing big data analytics on data in geographical-located datacenters usually is time-consuming and costly. In some delay-sensitive applications, the query result may become useless if answering a query takes too long time. Instead, sometimes users may only be interested in timely approximate rather than exact query results. When such approximate query evaluation is the case, applications must sacrifice timeliness to get more accurate evaluation results or tolerate evaluation result with a guaranteed error bound obtained from analyzing the samples of the data to meet their stringent timeline. In this paper, we study quality-of-service (QoS)-aware data replication and placement for approximate query evaluation of big data analytics in a distributed cloud, where the original (source) data of a query is distributed at different geo-distributed datacenters. We focus on the problems of placing data samples of the source data at some strategic datacenters to meet stringent query delay requirements of users, by exploring a non-trivial trade-off between the cost of query evaluation and the error bound of the evaluation result. We first propose an approximation algorithm with a provable approximation ratio for a single approximate query. We then develop an efficient heuristic algorithm for evaluating a set of approximate queries with the aim to minimize the evaluation cost while meeting the delay requirements of these queries. We finally demonstrate the effectiveness and efficiency of the proposed algorithms through both experimental simulations and implementations in a real test-bed, real datasets are employed. Experimental results show that the proposed algorithms are promising.
引用
收藏
页码:2677 / 2691
页数:15
相关论文
共 34 条
[1]  
Agarwal S., 2013, P 8 ACM EUR C COMP S, P29
[2]  
[Anonymous], 2010, P 7 USENIX C NETW SY
[3]  
[Anonymous], 2011, P ONL 2011 ACM SIGMO
[4]  
[Anonymous], CISC VIS NETW IND GL, P1
[5]   APPROXIMATION ALGORITHMS FOR DATA PLACEMENT PROBLEMS [J].
Baev, Ivan ;
Rajaraman, Rajmohan ;
Swamy, Chaitanya .
SIAM JOURNAL ON COMPUTING, 2008, 38 (04) :1411-1429
[6]   Energy-aware resource allocation heuristics for efficient management of data centers for Cloud computing [J].
Beloglazov, Anton ;
Abawajy, Jemal ;
Buyya, Rajkumar .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2012, 28 (05) :755-768
[7]  
Bernstein P.A., 2013, ACM SIGMOD International Conference on Management of Data, P923, DOI DOI 10.1145/2463676.2465339
[8]   Optimized stratified sampling for approximate query processing [J].
Chaudhuri, Surajit ;
Das, Gautam ;
Narasayya, Vivek .
ACM TRANSACTIONS ON DATABASE SYSTEMS, 2007, 32 (02)
[9]  
Convolbo MW, 2016, INT CONF CLOUD COMP, P302, DOI [10.1109/CloudCom.2016.0056, 10.1109/CloudCom.2016.53]
[10]   Making Queries Tractable on Big Data with Preprocessing [J].
Fan, Wenfei ;
Geerts, Floris ;
Neven, Frank .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (09) :685-696