A Distributed Graph Engine for Web Scale RDF Data

被引:161
作者
Zeng, Kai [1 ]
Yang, Jiacheng [2 ]
Wang, Haixun [3 ]
Shao, Bin [3 ]
Wang, Zhongyuan [3 ,4 ]
机构
[1] Univ Calif Los Angeles, Los Angeles, CA 90024 USA
[2] Columbia Univ, New York, NY USA
[3] Microsoft Res Asia, Beijing, Peoples R China
[4] Renmin Univ China, Beijing, Peoples R China
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2013年 / 6卷 / 04期
关键词
D O I
10.14778/2535570.2488333
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Much work has been devoted to supporting RDF data. But state-of-the-art systems and methods still cannot handle web scale RDF data effectively. Furthermore, many useful and general purpose graph-based operations (e.g., random walk, reachability, community discovery) on RDF data are not supported, as most existing systems store and index data in particular ways (e.g., as relational tables or as a bitmap matrix) to maximize one particular operation on RDF data: SPARQL query processing. In this paper, we introduce Trinity.RDF, a distributed, memory-based graph engine for web scale RDF data. Instead of managing the RDF data in triple stores or as bitmap matrices, we store RDF data in its native graph form. It achieves much better (sometimes orders of magnitude better) performance for SPARQL queries than the state-of-the-art approaches. Furthermore, since the data is stored in its native graph form, the system can support other operations (e.g., random walks, reachability) on RDF graphs as well. We conduct comprehensive experimental studies on real life, web scale RDF data to demonstrate the effectiveness of our approach.
引用
收藏
页码:265 / 276
页数:12
相关论文
共 36 条
[11]  
Clauset A, 2004, PHYS REV E, V70, DOI 10.1103/PhysRevE.70.066111
[12]  
Erling O, 2010, SEMANTIC WEB INFORMATION MANAGEMENT, P501, DOI 10.1007/978-3-642-04329-1_21
[13]   LUBM: A benchmark for OWL knowledge base systems [J].
Guo, YB ;
Pan, ZX ;
Heflin, J .
JOURNAL OF WEB SEMANTICS, 2005, 3 (2-3) :158-182
[14]  
Harth A, 2007, LECT NOTES COMPUT SC, V4825, P211
[15]  
Hayes J., 2004, ISWC
[16]  
He H., 2008, SIGMOD
[17]  
Huang J., 2011, PVLDB, V4
[18]   Heuristics-Based Query Processing for Large RDF Graphs Using Cloud Computing [J].
Husain, Mohammad Farhan ;
McGlothlin, James ;
Masud, Mohammad Mehedy ;
Khan, Latifur R. ;
Thuraisingham, Bhavani .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (09) :1312-1327
[19]  
Lu J, 2005, LECT NOTES COMPUT SC, V3739, P172
[20]   CHALLENGES IN PARALLEL GRAPH PROCESSING [J].
Lumsdaine, Andrew ;
Gregor, Douglas ;
Hendrickson, Bruce ;
Berry, Jonathan .
PARALLEL PROCESSING LETTERS, 2007, 17 (01) :5-20