Creating voiD descriptions for Web-scale data

被引:25
作者
Boehm, Christoph [1 ]
Lorey, Johannes [1 ]
Naumann, Felix [1 ]
机构
[1] Hasso Plattner Inst, D-14482 Potsdam, Germany
来源
JOURNAL OF WEB SEMANTICS | 2011年 / 9卷 / 03期
关键词
Semantic Web; Vocabulary of Interlinked Data; Semantic data profiling; RDF metadata generation; Cloud computing;
D O I
10.1016/j.websem.2011.06.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When working with large amounts of crawled semantic data as provided by the Billion Triple Challenge (BTC), it is desirable to present the data in a manner best suited for end users. This includes conceiving and presenting explanatory metainformation. The Vocabulary of Interlinked Data (voiD) has been proposed as a means to annotate sets of RDF resources to facilitate not only human understanding, but also query optimization. In this article we introduce tools that automatically generate voiD descriptions for large datasets. Our approach comprises different means to identify (sub) datasets and annotate the derived subsets according to the voiD specification. Due to the complexity of Web-scale Linked Data, all algorithms used for partitioning and augmenting are implemented in a cloud environment utilizing the MapReduce paradigm. We employed the Billion Triple Challenge 2010 dataset [6] to evaluate our approach, and present the results in this article. We have released a tool named voiDgen to the public that allows the generation of metainformation for such large datasets. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:339 / 345
页数:7
相关论文
共 15 条
[1]  
Alexander K., 2009, WWW WORKSH LINK DAT
[2]  
Alexander K., 2011, W3C Interest Group Note
[3]  
Alexander K., VOID BROWSER
[4]  
[Anonymous], 2006, Linked data
[5]  
[Anonymous], PUBLIC SECTOR INFORM
[6]  
[Anonymous], Billion Triple Challenge
[7]  
[Anonymous], OECD GLOSS STAT TERM
[8]  
Beckett D., 2008, Turtle - Terse RDF Triple Language
[9]   DBpedia - A crystallization point for the Web of Data [J].
Bizer, Christian ;
Lehmann, Jens ;
Kobilarov, Georgi ;
Auer, Soeren ;
Becker, Christian ;
Cyganiak, Richard ;
Hellmann, Sebastian .
JOURNAL OF WEB SEMANTICS, 2009, 7 (03) :154-165
[10]  
Bohm C., 2010, WORKSH NEW TRENDS IN