RDF partitioning for scalable SPARQL query processing

被引：6

作者：

Wang, Xiaoyan ^{[1
,2
,3
]}

Yang, Tao ^{[1
]}

Chen, Jinchuan ^{[2
]}

He, Long ^{[1
]}

Du, Xiaoyong ^{[1
,2
,4
]}

机构：

[1] Renmin Univ China, Sch Informat, Beijing 100872, Peoples R China

[2] Renmin Univ, Minist Educ, Key Lab Data Engn & Knowledge Engn, Beijing 100872, Peoples R China

[3] Supreme Peoples Court, Informat Ctr, Beijing 100745, Peoples R China

[4] Beihang Univ, State Key Lab Software Dev Environm, Beijing 100191, Peoples R China

来源：

FRONTIERS OF COMPUTER SCIENCE | 2015年 / 9卷 / 06期

关键词：

RDF data; data partitioning; SPARQL query;

D O I：

10.1007/s11704-015-4104-3

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The volume of RDF data increases dramatically within recent years, while cloud computing platforms like Hadoop are supposed to be a good choice for processing queries over huge data sets for their wonderful scalability. Previous work on evaluating SPARQL queries with Hadoop mainly focus on reducing the number of joins through careful split of HDFS files and algorithms for generating Map/Reduce jobs. However, the way of partitioning RDF data could also affect system performance. Specifically, a good partitioning solution would greatly reduce or even totally avoid cross-node joins, and significantly cut down the cost in query evaluation. Based on HadoopDB, this work processes SPARQL queries in a hybrid architecture, where Map/Reduce takes charge of the computing tasks, and RDF query engines like RDF-3X store the data and execute join operations. According to the analysis of query workloads, this work proposes a novel algorithm for automatically partitioning RDF data and an approximate solution to physically place the partitions in order to reduce data redundancy. It also discusses how to make a good trade-off between query evaluation efficiency and data redundancy. All of these proposed approaches have been evaluated by extensive experiments over large RDF data sets.

引用

页码：919 / 933

页数：15

共 50 条

[21] RDF Data Storage Techniques for Efficient SPARQL Query Processing using Distributed Computation Engines
Hassan, Mahmudul
Bansal, Srividya K.
2018 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2018, : 323 - 330
[22] R3F: RDF triple filtering method for efficient SPARQL query processing
Kim, Kisung
Moon, Bongki
Kim, Hyoung-Joo
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2015, 18 (02): : 317 - 357
[23] R3F: RDF triple filtering method for efficient SPARQL query processing
Kisung Kim
Bongki Moon
Hyoung-Joo Kim
World Wide Web, 2015, 18 : 317 - 357
[24] Scalable Multi-Query Optimization for SPARQL
Le, Wangchao
Kementsietsidis, Anastasios
Duan, Songyun
Li, Feifei
2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 666 - 677
[25] Distributed SPARQL query answering over RDF data streams
Leida, Marcello
Chu, Andrej
2013 IEEE INTERNATIONAL CONGRESS ON BIG DATA, 2013, : 369 - 378
[26] Query Processing for RDF Databases
Kaoudi, Zoi
Kementsietsidis, Anastasios
REASONING WEB: REASONING ON THE WEB IN THE BIG DATA ERA, 2014, 8714 : 141 - +
[27] Scalable SPARQL Querying using Path Partitioning
Wu, Buwen
Zhou, Yongluan
Yuan, Pingpeng
Liu, Ling
Jin, Hai
2015 IEEE 31ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2015, : 795 - 806
[28] SPARQL Query Parallel Processing: A Survey
Feng, Jiaying
Meng, Chenhong
Song, Jiaming
Zhang, Xiaowang
Feng, Zhiyong
Zou, Lei
2017 IEEE 6TH INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS 2017), 2017, : 444 - 451
[29] Query processing for RDF databases
1600, Springer Verlag (8714):
[30] A New query method for the temporal RDF Model RDFMT Based on SPARQL
Li, Haixia
PROCEEDINGS OF 2021 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INFORMATION SYSTEMS (ICAIIS '21), 2021,

← 1 2 3 4 5 →