RDF partitioning for scalable SPARQL query processing

被引:6
|
作者
Wang, Xiaoyan [1 ,2 ,3 ]
Yang, Tao [1 ]
Chen, Jinchuan [2 ]
He, Long [1 ]
Du, Xiaoyong [1 ,2 ,4 ]
机构
[1] Renmin Univ China, Sch Informat, Beijing 100872, Peoples R China
[2] Renmin Univ, Minist Educ, Key Lab Data Engn & Knowledge Engn, Beijing 100872, Peoples R China
[3] Supreme Peoples Court, Informat Ctr, Beijing 100745, Peoples R China
[4] Beihang Univ, State Key Lab Software Dev Environm, Beijing 100191, Peoples R China
关键词
RDF data; data partitioning; SPARQL query;
D O I
10.1007/s11704-015-4104-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The volume of RDF data increases dramatically within recent years, while cloud computing platforms like Hadoop are supposed to be a good choice for processing queries over huge data sets for their wonderful scalability. Previous work on evaluating SPARQL queries with Hadoop mainly focus on reducing the number of joins through careful split of HDFS files and algorithms for generating Map/Reduce jobs. However, the way of partitioning RDF data could also affect system performance. Specifically, a good partitioning solution would greatly reduce or even totally avoid cross-node joins, and significantly cut down the cost in query evaluation. Based on HadoopDB, this work processes SPARQL queries in a hybrid architecture, where Map/Reduce takes charge of the computing tasks, and RDF query engines like RDF-3X store the data and execute join operations. According to the analysis of query workloads, this work proposes a novel algorithm for automatically partitioning RDF data and an approximate solution to physically place the partitions in order to reduce data redundancy. It also discusses how to make a good trade-off between query evaluation efficiency and data redundancy. All of these proposed approaches have been evaluated by extensive experiments over large RDF data sets.
引用
收藏
页码:919 / 933
页数:15
相关论文
共 50 条
  • [1] RDF partitioning for scalable SPARQL query processing
    Xiaoyan WANG
    Tao YANG
    Jinchuan CHEN
    Long HE
    Xiaoyong DU
    Frontiers of Computer Science, 2015, 9 (06) : 919 - 933
  • [2] RDF partitioning for scalable SPARQL query processing
    Xiaoyan Wang
    Tao Yang
    Jinchuan Chen
    Long He
    Xiaoyong Du
    Frontiers of Computer Science, 2015, 9 : 919 - 933
  • [3] Towards efficient SPARQL query processing on RDF data
    Liu C.
    Wang H.
    Yu Y.
    Xu L.
    Tsinghua Science and Technology, 2010, 15 (06) : 613 - 622
  • [4] Towards Efficient SPARQL Query Processing on RDF Data
    刘畅
    王昊奋
    俞勇
    徐林昊
    TsinghuaScienceandTechnology, 2010, 15 (06) : 613 - 622
  • [5] Research on Efficient SPARQL Query Processing for RDF Data
    Zhang, Yi
    PROCEEDINGS OF THE 2015 2ND INTERNATIONAL WORKSHOP ON MATERIALS ENGINEERING AND COMPUTER SCIENCES (IWMECS 2015), 2015, 33 : 476 - 482
  • [6] Efficient and Scalable SPARQL Query Processing with Transformed Table
    Huang, Sheng-Wei
    Yu, Chia-Ho
    Shieh, Ce-Kuen
    Tsai, Ming-Fong
    2015 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE WORKSHOPS (WCNCW), 2015, : 103 - 106
  • [7] A Decentralized Architecture for SPARQL Query Processing and RDF Sharing: A Position Paper
    Marx, Edgard
    Saleem, Muhammad
    Lytra, Ioanna
    Ngomo, Axel-Cyrille Ngonga
    2018 IEEE 12TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2018, : 274 - 277
  • [8] DIAERESIS: RDF data partitioning and query processing on SPARK
    Troullinou, Georgia
    Agathangelos, Giannis
    Kondylakis, Haridimos
    Stefanidis, Kostas
    Plexousakis, Dimitris
    SEMANTIC WEB, 2024, 15 (05) : 1763 - 1789
  • [9] RG-index: An RDF graph index for efficient SPARQL query processing
    Kim, Kisung
    Moon, Bongki
    Kim, Hyoung-Joo
    EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (10) : 4596 - 4607
  • [10] SPARQL Query Generation based on RDF Graph
    Kharrat, Mohamed
    Jedidi, Anis
    Gargouri, Faiez
    KDIR: PROCEEDINGS OF THE 8TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT - VOL. 1, 2016, : 450 - 455