A Distributed Engine for Multi-query Processing Based on Predicates with Spark

被引:0
作者
Zhang, Bin [1 ]
Sun, Ximin [1 ]
Bi, Liwei [2 ]
Zhao, Changhao [2 ]
Chen, Xin [2 ]
Li, Xin [2 ]
Sun, Lei [3 ]
机构
[1] State Grid Elect Commerce Co Ltd, State Grid Financial Technol Grp, Beijing, Peoples R China
[2] State Grid Ecommerce Technol Co Ltd, Beijing, Peoples R China
[3] Tianjin Univ, Coll Intelligence & Comp, Peiyang Pk Campus, Tianjin, Peoples R China
来源
WEB AND BIG DATA | 2021年 / 1505卷
关键词
Multi-query; RDF; Spark;
D O I
10.1007/978-981-16-8143-1_3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The core of Multi-query Optimization is to find the maximum common substructure of query graphs. However, this problem is equivalent to the weighted set cover problem, which is NP-complete. In this paper, we propose an distributed RDF engine for multi-query processing with Spark. The system processes SPARQL queries with translating them into Spark SQL. We utilize the predicate information as the feature of the query and cluster the multiple queries which share more common features into groups. We conduct experiments with synthetic datasets, compared with the result without MQO processing, we could show the effectiveness of our approach.
引用
收藏
页码:27 / 36
页数:10
相关论文
共 19 条
[1]  
[Anonymous], 2012, P EXT SEM WEB C, DOI DOI 10.1007/978-3-662-46641-4_48
[2]   Linked Data - The Story So Far [J].
Bizer, Christian ;
Heath, Tom ;
Berners-Lee, Tim .
INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS, 2009, 5 (03) :1-22
[3]   Efficient SPARQL Query Evaluation In a Database Cluster [J].
Du, Fang ;
Bian, Haoqiong ;
Chen, Yueguo ;
Du, Xiaoyong .
2013 IEEE INTERNATIONAL CONGRESS ON BIG DATA, 2013, :165-172
[4]   Partout: A Distributed Engine for Efficient RDF Processing [J].
Galarraga, Luis ;
Hose, Katja ;
Schenkel, Ralf .
WWW'14 COMPANION: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2014, :267-268
[5]   Leon: A Distributed RDF Engine for Multi-query Processing [J].
Guo, Xintong ;
Gao, Hong ;
Zou, Zhaonian .
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2019), PT I, 2019, 11446 :742-759
[6]   TriAD: A Distributed Shared-Nothing RDF Engine based on Asynchronous Message Passing [J].
Gurajada, Sairam ;
Seufert, Stephan ;
Miliaraki, Iris ;
Theobald, Martin .
SIGMOD'14: PROCEEDINGS OF THE 2014 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2014, :289-300
[7]   Accelerating SPARQL queries by exploiting hash-based locality and adaptive partitioning [J].
Harbi, Razen ;
Abdelaziz, Ibrahim ;
Kalnis, Panos ;
Mamoulis, Nikos ;
Ebrahim, Yasser ;
Sahli, Majed .
VLDB JOURNAL, 2016, 25 (03) :355-380
[8]  
Hassan M., 2020, 2020 IEEE INT C SMAR
[9]   Data Partitioning Scheme for Efficient Distributed RDF Querying Using Apache Spark [J].
Hassan, Mahmudul ;
Bansal, Srividya K. .
2019 13TH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2019, :24-31
[10]   Stylus: A Strongly-Typed Store for Serving Massive RDF Data [J].
He, Liang ;
Shao, Bin ;
Li, Yatao ;
Xia, Huanhuan ;
Xiao, Yanghua ;
Chen, Enhong ;
Chen, Liang Jeff .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 11 (02) :203-216