RDF partitioning for scalable SPARQL query processing

被引:6
|
作者
Wang, Xiaoyan [1 ,2 ,3 ]
Yang, Tao [1 ]
Chen, Jinchuan [2 ]
He, Long [1 ]
Du, Xiaoyong [1 ,2 ,4 ]
机构
[1] Renmin Univ China, Sch Informat, Beijing 100872, Peoples R China
[2] Renmin Univ, Minist Educ, Key Lab Data Engn & Knowledge Engn, Beijing 100872, Peoples R China
[3] Supreme Peoples Court, Informat Ctr, Beijing 100745, Peoples R China
[4] Beihang Univ, State Key Lab Software Dev Environm, Beijing 100191, Peoples R China
关键词
RDF data; data partitioning; SPARQL query;
D O I
10.1007/s11704-015-4104-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The volume of RDF data increases dramatically within recent years, while cloud computing platforms like Hadoop are supposed to be a good choice for processing queries over huge data sets for their wonderful scalability. Previous work on evaluating SPARQL queries with Hadoop mainly focus on reducing the number of joins through careful split of HDFS files and algorithms for generating Map/Reduce jobs. However, the way of partitioning RDF data could also affect system performance. Specifically, a good partitioning solution would greatly reduce or even totally avoid cross-node joins, and significantly cut down the cost in query evaluation. Based on HadoopDB, this work processes SPARQL queries in a hybrid architecture, where Map/Reduce takes charge of the computing tasks, and RDF query engines like RDF-3X store the data and execute join operations. According to the analysis of query workloads, this work proposes a novel algorithm for automatically partitioning RDF data and an approximate solution to physically place the partitions in order to reduce data redundancy. It also discusses how to make a good trade-off between query evaluation efficiency and data redundancy. All of these proposed approaches have been evaluated by extensive experiments over large RDF data sets.
引用
收藏
页码:919 / 933
页数:15
相关论文
共 50 条
  • [21] RDF Data Storage Techniques for Efficient SPARQL Query Processing using Distributed Computation Engines
    Hassan, Mahmudul
    Bansal, Srividya K.
    2018 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2018, : 323 - 330
  • [22] R3F: RDF triple filtering method for efficient SPARQL query processing
    Kim, Kisung
    Moon, Bongki
    Kim, Hyoung-Joo
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2015, 18 (02): : 317 - 357
  • [23] R3F: RDF triple filtering method for efficient SPARQL query processing
    Kisung Kim
    Bongki Moon
    Hyoung-Joo Kim
    World Wide Web, 2015, 18 : 317 - 357
  • [24] Scalable Multi-Query Optimization for SPARQL
    Le, Wangchao
    Kementsietsidis, Anastasios
    Duan, Songyun
    Li, Feifei
    2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 666 - 677
  • [25] Distributed SPARQL query answering over RDF data streams
    Leida, Marcello
    Chu, Andrej
    2013 IEEE INTERNATIONAL CONGRESS ON BIG DATA, 2013, : 369 - 378
  • [26] Query Processing for RDF Databases
    Kaoudi, Zoi
    Kementsietsidis, Anastasios
    REASONING WEB: REASONING ON THE WEB IN THE BIG DATA ERA, 2014, 8714 : 141 - +
  • [27] Scalable SPARQL Querying using Path Partitioning
    Wu, Buwen
    Zhou, Yongluan
    Yuan, Pingpeng
    Liu, Ling
    Jin, Hai
    2015 IEEE 31ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2015, : 795 - 806
  • [28] SPARQL Query Parallel Processing: A Survey
    Feng, Jiaying
    Meng, Chenhong
    Song, Jiaming
    Zhang, Xiaowang
    Feng, Zhiyong
    Zou, Lei
    2017 IEEE 6TH INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS 2017), 2017, : 444 - 451
  • [29] Query processing for RDF databases
    1600, Springer Verlag (8714):
  • [30] A New query method for the temporal RDF Model RDFMT Based on SPARQL
    Li, Haixia
    PROCEEDINGS OF 2021 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INFORMATION SYSTEMS (ICAIIS '21), 2021,