MuSe: a multi-level storage scheme for big RDF data using MapReduce

被引:6
作者
Chawla, Tanvi [1 ]
Singh, Girdhari [1 ]
Pilli, Emmanuel S. [1 ]
机构
[1] Malaviya Natl Inst Technol, Dept Comp Sci & Engn, Jaipur, India
关键词
RDF; SPARQL; Hadoop; HDFS; MapReduce; Storage; BENCHMARK;
D O I
10.1186/s40537-021-00519-6
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Resource Description Framework (RDF) model owing to its flexible structure is increasingly being used to represent Linked data. The rise in amount of Linked data and Knowledge graphs has resulted in an increase in the volume of RDF data. RDF is used to model metadata especially for social media domains where the data is linked. With the plethora of RDF data sources available on the Web, scalable RDF data management becomes a tedious task. In this paper, we present MuSe-an efficient distributed RDF storage scheme for storing and querying RDF data with Hadoop MapReduce. In MuSe, the Big RDF data is stored at two levels for answering the common triple patterns in SPARQL queries. MuSe considers the type of frequently occuring triple patterns and optimizes RDF storage to answer such triple patterns in minimum time. It accesses only the tables that are sufficient for answering a triple pattern instead of scanning the whole RDF dataset. The extensive experiments on two synthetic RDF datasets i.e. LUBM and WatDiv, show that MuSe outperforms the compared state-of-the art frameworks in terms of query execution time and scalability.
引用
收藏
页数:26
相关论文
共 29 条
[1]   SW-Store: a vertically partitioned DBMS for Semantic Web data management [J].
Abadi, Daniel J. ;
Marcus, Adam ;
Madden, Samuel R. ;
Hollenbach, Kate .
VLDB JOURNAL, 2009, 18 (02) :385-406
[2]  
Aluç G, 2014, LECT NOTES COMPUT SC, V8796, P197, DOI 10.1007/978-3-319-11964-9_13
[3]  
Bouchelouche K., 2021, OPEN GOVT, V10, P1
[4]  
Cardoso J, 2006, SEMANTIC WEB BEYOND, V3, P3, DOI 10.1007/978-0-387-34685-4_1
[5]  
Chawla T., 2018, P 2018 9 INT C COMP
[6]  
Chawla T, 2016, INT C EM TRENDS COMM, V2016, P1
[7]   Storage, partitioning, indexing and retrieval in Big RDF frameworks: A survey [J].
Chawla, Tanvi ;
Singh, Girdhari ;
Pilli, Emmanuel S. ;
Govil, M. C. .
COMPUTER SCIENCE REVIEW, 2020, 38
[8]   HyPSo: Hybrid Partitioning for Big RDF Storage and Query Processing [J].
Chawla, Tanvi ;
Singh, Girdhari ;
Pilli, Emmanuel S. .
PROCEEDINGS OF THE 6TH ACM IKDD CODS AND 24TH COMAD, 2019, :188-194
[9]  
Cossu M., 2018, P 21 INT C EXTENDING, P469
[10]   Mapreduce: Simplified data processing on large clusters [J].
Dean, Jeffrey ;
Ghemawat, Sanjay .
COMMUNICATIONS OF THE ACM, 2008, 51 (01) :107-113