Efficient Distributed Query Processing on Large Scale RDF Graph Data

被引:0
作者
Wang X. [1 ,2 ]
Xu Q. [1 ,2 ]
Chai L.-L. [1 ,2 ]
Yang Y.-J. [1 ,2 ,4 ]
Chai Y.-P. [3 ]
机构
[1] College of Intelligence and Computing, Tianjin University, Tianjin
[2] Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin
[3] School of Information, Renmin University of China, Beijing
[4] State Key Laboratory of Digital Publishing Technology, Beijing
来源
Ruan Jian Xue Bao/Journal of Software | 2019年 / 30卷 / 03期
基金
中国国家自然科学基金;
关键词
Basic graph pattern matching; Distributed; Large scale RDF graphs; MapReduce; Star decomposition;
D O I
10.13328/j.cnki.jos.005696
中图分类号
学科分类号
摘要
Knowledge graphs are the main representation form of intelligent data. With the development of knowledge graphs, more and more intelligent data has been released in the form of the resource description framework (RDF). It is known that the semantics of SPARQL correspond to graph homomorphism which is an NP-complete problem. Therefore, how to efficiently answer SPARQL queries in parallel over big RDF graphs has been widely recognized as a challenging problem. There are some research works using the MapReduce computational model to process big RDF graph. However, SPARQL queries in these works are decomposed into the set of query clauses without considering any semantics and graph structure embedded in RDF graph, which leads to overmuch MapReduce iterations. This study first decomposes the SPARQL query graph into a set of stars by utilizing the semantic and structural information embedded RDF graphs as heuristics, which can be matched in fewer MapReduce iterations. Meanwhile, a matching order of these stars is given to reduce intermediate results in MapReduce iterations. During the matching phase, each round of MapReduce adds one star with the join operation. The extensive experiments on both synthetic dataset WatDiv, and real-world dataset DBpedia are carried out. The experiments results demonstrate that the proposed star decomposition-based method can answer SPARQL BGP queries efficiently, which outperforms SHARD and S2X by one order of magnitude. Finally, extensive experiments show that the performance of the optimization algorithms is improved by 49.63% and 78.71% than the basic algorithm over both synthetic and real datasets. © Copyright 2019, Institute of Software, the Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:498 / 514
页数:16
相关论文
共 22 条
[1]  
Hammoud M., Rabbou D.A., Nouri R., Beheshti S.M.R., Sakr S., DREAM: Distributed RDF engine with adaptive query planner and minimal communication, Proc. of the VLDB Endowment, 8, 6, pp. 654-665, (2015)
[2]  
Perez J., Arenas M., Gutierrez C., Semantics and complexity of SPARQL, Proc. of the Semantic Web (ISWC 2006), pp. 30-43, (2006)
[3]  
Dyer M., Greenhill C., The complexity of counting graph homomorphisms, Random Structures & Algorithms, 17, 3-4, pp. 260-289, (2000)
[4]  
Du F., Chen Y.G., Du X.Y., Survey of RDF query processing techniques, Ruan Jian Xue Bao/Journal of Software, 24, 6, pp. 1222-1242, (2013)
[5]  
Neumann T., Weikum G., RDF-3X: A RISC-style engine for RDF, Proc. of the VLDB Endowment, 1, 1, pp. 647-659, (2008)
[6]  
Zou L., Chen L., Shen X., Huang R., Zhao D., gStore: A graph-based SPARQL query engine, VLDB Journal-The Int'l Journal on Very Large Data Bases, 23, 4, pp. 565-590, (2014)
[7]  
Rohloff K., Schantz R.E., High-performance, massively scalable distributed systems using the MapReduce software framework: The SHARD triple-store, Proc. of the Programming Support Innovations for Emerging Distributed Applications, (2010)
[8]  
Husain M., Mcglothlin J., Masud M.M., Khan L.R., Thuraisingham B., Heuristics-based query processing for large RDF graphs using cloud computing, IEEE Trans. on Knowledge & Data Engineering, 23, 9, pp. 1312-1327, (2011)
[9]  
Schatzle A., Przyjaciel-Zablocki M., Berberich T., Lausen G., S2X: Graph-parallel querying of RDF with GraphX, Proc. of the Biomedical Data Management and Graph Online Querying, pp. 155-168, (2016)
[10]  
Erling O., Mikhailov I., Virtuoso: RDF Support in a Native RDBMS, pp. 501-519, (2009)