Adaptive Distributed RDF Graph Fragmentation and Allocation based on Query Workload

被引:25
作者
Peng, Peng [1 ]
Zou, Lei [2 ]
Chen, Lei [3 ]
Zhao, Dongyan [2 ]
机构
[1] Hunan Univ, Changsha 410006, Hunan, Peoples R China
[2] Peking Univ, Inst Comp Sci & Technol, Beijing 100080, Peoples R China
[3] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China
基金
美国国家科学基金会;
关键词
Distributed RDF database; data fragmentation; data allocation; query workload; EFFICIENT;
D O I
10.1109/TKDE.2018.2841389
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As massive volumes of Resource Description Framework (RDF) data are growing, designing a distributed RDF database system to manage them is necessary. In designing this system, it is very common to partition the RDF data into some parts, called fragments, which are then distributed. Thus, the distribution design comprises two steps: fragmentation and allocation. In this study, we explore the workload for fragmentation and allocation, which aims to reduce the communication cost during SPARQL query processing. Specifically, we adaptively maintain some frequent access patterns (FAPs) to reflect the characteristics of the workload while ensuring the data integrity and approximation ratio. Based on these frequent access patterns, we propose three fragmentation strategies, namely vertical, horizontal, and mixed fragmentation, to divide RDF graphs while meeting different types of query processing objectives. After fragmentation, we discuss how to allocate these fragments to various sites while balancing the fragments. Finally, we discuss how to process queries based on the results of fragmentation and allocation. Experiments over large RDF datasets confirm the superior performance of our proposed solutions.
引用
收藏
页码:670 / 685
页数:16
相关论文
共 30 条
[1]  
Aluç G, 2014, LECT NOTES COMPUT SC, V8796, P197, DOI 10.1007/978-3-319-11964-9_13
[2]  
Astrahan M. M., 1976, ACM Transactions on Database Systems, V1, P97, DOI 10.1145/320455.320457
[3]   Fast PNN-based clustering using K-nearest neighbor graph [J].
Fränti, P ;
Virmajoki, O ;
Hautamäki, V .
THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2003, :525-528
[4]   Partout: A Distributed Engine for Efficient RDF Processing [J].
Galarraga, Luis ;
Hose, Katja ;
Schenkel, Ralf .
WWW'14 COMPANION: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2014, :267-268
[5]   TriAD: A Distributed Shared-Nothing RDF Engine based on Asynchronous Message Passing [J].
Gurajada, Sairam ;
Seufert, Stephan ;
Miliaraki, Iris ;
Theobald, Martin .
SIGMOD'14: PROCEEDINGS OF THE 2014 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2014, :289-300
[6]   Accelerating SPARQL queries by exploiting hash-based locality and adaptive partitioning [J].
Harbi, Razen ;
Abdelaziz, Ibrahim ;
Kalnis, Panos ;
Mamoulis, Nikos ;
Ebrahim, Yasser ;
Sahli, Majed .
VLDB JOURNAL, 2016, 25 (03) :355-380
[7]  
Hose K, 2013, I C DATA ENGIN WORKS, P1, DOI 10.1109/ICDEW.2013.6547414
[8]  
Huang JW, 2011, PROC VLDB ENDOW, V4, P1123
[9]   Heuristics-Based Query Processing for Large RDF Graphs Using Cloud Computing [J].
Husain, Mohammad Farhan ;
McGlothlin, James ;
Masud, Mohammad Mehedy ;
Khan, Latifur R. ;
Thuraisingham, Bhavani .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (09) :1312-1327
[10]  
Karypis G, 1995, SUPERCOMP PROC, P658