Heuristics-Based Query Processing for Large RDF Graphs Using Cloud Computing

被引:89
作者
Husain, Mohammad Farhan [1 ]
McGlothlin, James
Masud, Mohammad Mehedy
Khan, Latifur R. [2 ]
Thuraisingham, Bhavani [3 ,4 ]
机构
[1] Amazon Com, Seattle, WA 98121 USA
[2] Univ Texas Dallas, Dept Comp Sci, Richardson, TX 75080 USA
[3] Univ Texas Dallas, Erik Jonsson Sch Engn & Comp Sci CS, Richardson, TX 75080 USA
[4] Univ Texas Dallas, CSRC, Richardson, TX 75080 USA
关键词
Hadoop; RDF; SPARQL; MapReduce;
D O I
10.1109/TKDE.2011.103
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semantic web is an emerging area to augment human reasoning. Various technologies are being developed in this arena which have been standardized by the World Wide Web Consortium (W3C). One such standard is the Resource Description Framework (RDF). Semantic web technologies can be utilized to build efficient and scalable systems for Cloud Computing. With the explosion of semantic web technologies, large RDF graphs are common place. This poses significant challenges for the storage and retrieval of RDF graphs. Current frameworks do not scale for large RDF graphs and as a result do not address these challenges. In this paper, we describe a framework that we built using Hadoop to store and retrieve large numbers of RDF triples by exploiting the cloud computing paradigm. We describe a scheme to store RDF data in Hadoop Distributed File System. More than one Hadoop job (the smallest unit of execution in Hadoop) may be needed to answer a query because a single triple pattern in a query cannot simultaneously take part in more than one join in a single Hadoop job. To determine the jobs, we present an algorithm to generate query plan, whose worst case cost is bounded, based on a greedy approach to answer a SPARQL Protocol and RDF Query Language (SPARQL) query. We use Hadoop's MapReduce framework to answer the queries. Our results show that we can store large RDF graphs in Hadoop clusters built with cheap commodity class hardware. Furthermore, we show that our framework is scalable and efficient and can handle large amounts of RDF data, unlike traditional approaches.
引用
收藏
页码:1312 / 1327
页数:16
相关论文
共 42 条
[1]  
Abadi D.J., 2009, B IEEE COMPUTER SOC, V32, P3
[2]   SW-Store: a vertically partitioned DBMS for Semantic Web data management [J].
Abadi, Daniel J. ;
Marcus, Adam ;
Madden, Samuel R. ;
Hollenbach, Kate .
VLDB JOURNAL, 2009, 18 (02) :385-406
[3]  
ABADI DJ, 2007, P 33 INT C VER LARG
[4]  
[Anonymous], 2004, P 6 C S OP SYST DES
[5]  
[Anonymous], 2006, P 7 USENIX S OP SYST
[6]  
[Anonymous], 2009, Proceedings of the VLDB Endowment
[7]  
[Anonymous], 2005, Digital Media Systems Laboratory HP Laboratories Bristol. HPL-2005-170, 35, 9
[8]  
ATRE M, 2008, P INT SEM WEB C
[9]  
Boncz Peter., 2006, SIGMOD 06, P479
[10]  
Carroll JJ., 2004, P 13 INT WORLD WID W, P74, DOI [10.1145/1013367.1013381, DOI 10.1145/1013367.1013381]