Efficient RDF Knowledge Graph Partitioning Using Querying Workload

被引:3
作者
Akhter, Adnan [1 ]
Saleem, Muhammad [2 ]
Bigerl, Alexander [1 ]
Ngomo, Axel-Cyrille Ngonga [1 ]
机构
[1] Paderborn Univ, Dept Comp Sci, Data Sci Grp, Paderborn, Germany
[2] Univ Leipzig, AKSW Res Grp, Leipzig, Germany
来源
PROCEEDINGS OF THE 11TH KNOWLEDGE CAPTURE CONFERENCE (K-CAP '21) | 2021年
关键词
RDF knowledge graph partitioning; querying workload; predicate co-occurrence; PCG; PCM;
D O I
10.1145/3460210.3493577
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data partitioning is an effective way to manage large datasets. While a broad range of RDF graph partitioning techniques has been proposed in previous works, little attention has been given to workload-aware RDF graph partitioning. In this paper, we propose two techniques that make use of the querying workload to detect the portions of RDF graphs that are often queried concurrently. Our techniques leverage predicate co-occurrences in SPARQL queries. By detecting highly co-occurring predicates, our techniques can keep data pertaining to these predicates in the same data partition. We evaluate the proposed partitioning techniques using various real-data and query benchmarks generated by the FEASIBLE SPARQL benchmark generation framework. Our evaluation results show the superiority of the proposed techniques in comparison to previous techniques in terms of better query runtime performances.
引用
收藏
页码:169 / 176
页数:8
相关论文
共 20 条
[1]  
Abadi, 2007, SCALABLE SEMANTIC WE
[2]  
Akhter, 2018, EUROPEAN KNOWLEDGE A
[3]   Adaptive Workload-Based Partitioning and Replication for RDF Graphs [J].
Al-Ghezi, Ahmed ;
Wiese, Lena .
DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA 2018), PT II, 2018, 11030 :250-258
[4]  
Aluc, 2013, CS201310 U WAT
[5]  
Erling, 2009, NETWORKED KNOWLEDGE
[6]   Partout: A Distributed Engine for Efficient RDF Processing [J].
Galarraga, Luis ;
Hose, Katja ;
Schenkel, Ralf .
WWW'14 COMPANION: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2014, :267-268
[7]   SPARQLGX: Efficient Distributed Evaluation of SPARQL with Apache Spark [J].
Graux, Damien ;
Jachiet, Louis ;
Geneves, Pierre ;
Layaida, Nabil .
SEMANTIC WEB - ISWC 2016, PT II, 2016, 9982 :80-87
[8]   TriAD: A Distributed Shared-Nothing RDF Engine based on Asynchronous Message Passing [J].
Gurajada, Sairam ;
Seufert, Stephan ;
Miliaraki, Iris ;
Theobald, Martin .
SIGMOD'14: PROCEEDINGS OF THE 2014 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2014, :289-300
[9]  
Harth A, 2007, LECT NOTES COMPUT SC, V4825, P211
[10]  
Huang JW, 2011, PROC VLDB ENDOW, V4, P1123