Impact analysis of data placement strategies on query efforts in distributed RDF stores

被引:5
作者
Janke, Daniel [1 ]
Staab, Steffen [1 ,2 ]
Thimm, Matthias [1 ]
机构
[1] Univ Koblenz Landau, Inst Web Sci & Technol, Univ Str 1, D-56070 Koblenz, Germany
[2] Univ Southampton, Web & Internet Sci Grp Bldg 32,Highfield Campus, Southampton SO17 1BJ, Hants, England
来源
JOURNAL OF WEB SEMANTICS | 2018年 / 50卷
关键词
Distributed RDF stores; Graph partitioning; Benchmark; SPARQL; MAPREDUCE;
D O I
10.1016/j.websem.2018.02.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the last years, scalable RDF stores in the cloud have been developed, where graph data is distributed over compute and storage nodes for scaling efforts of query processing and memory needs. One main challenge in these RDF stores is the data placement strategy that can be formalized in terms of graph covers. These graph covers determine whether (a) the triples distribution is well-balanced over all storage nodes (storage balance) (b) different query results may be computed on several compute nodes in parallel (vertical parallelization) and (c) individual query results can be produced only from triples assigned to few-ideally one-storage node (horizontal containment). We analyse the impact of three most commonly used graph cover strategies in these terms and found out that balancing query workload reduces the query execution time more than reducing data transfer over network. To this end, we present our novel benchmark and open source evaluation platform Koral. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:21 / 48
页数:28
相关论文
共 57 条
[1]  
Abadi Daniel J., 2007, Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB'07, P411
[2]  
[Anonymous], 2014, SSWS 2014
[3]  
[Anonymous], P 6 IEEE INT C SEM C
[4]  
[Anonymous], TCP IP ILLUSTRATED
[5]  
Arenas M., 2012, LECT NOTES COMPUTER, V7487, P78, DOI DOI 10.1007/978-3-642-33158-9_3
[6]  
Basca C., 2013, ISWC2013
[7]  
Basca C., 2014, WEB SEMANTICS SCI SE, V26
[8]   COSI: Cloud Oriented Subgraph Identification in Massive Social Networks [J].
Brocheler, Matthias ;
Pugliese, Andrea ;
Subrahmanian, V. S. .
2010 INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM 2010), 2010, :248-255
[9]  
Cure O., 2015, 11 INT WORKSH SCAL S, P16
[10]   Efficient SPARQL Query Evaluation In a Database Cluster [J].
Du, Fang ;
Bian, Haoqiong ;
Chen, Yueguo ;
Du, Xiaoyong .
2013 IEEE INTERNATIONAL CONGRESS ON BIG DATA, 2013, :165-172