BTC-2019: The 2019 Billion Triple Challenge Dataset

被引:8
作者
Herrera, Jose-Miguel [1 ]
Hogan, Aidan [1 ]
Kaefer, Tobias [2 ]
机构
[1] Univ Chile, DCC, IMFD, Santiago, Chile
[2] Karlsruhe Inst Technol, Karlsruhe, Germany
来源
SEMANTIC WEB - ISWC 2019, PT II | 2019年 / 11779卷
关键词
SEMANTIC WEB CHALLENGE; LINKED DATA; DECOMPOSITION;
D O I
10.1007/978-3-030-30796-7_11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Six datasets have been published under the title of Billion Triple Challenge (BTC) since 2008. Each such dataset contains billions of triples extracted from millions of documents crawed from hundreds of domains. While these datasets were originally motivated by the annual ISWC competition from which they take their name, they would become widely used in other contexts, forming a key resource for a variety of research works concerned with managing and/or analysing diverse, real-world RDF data as found natively on the Web. Given that the last BTC dataset was published in 2014, we prepare and publish a new version BTC-2019 - containing 2.2 billion quads parsed from 2.6 million documents on 394 pay-level-domains. This paper first motivates the BTC datasets with a survey of research works using these datasets. Next we provide details of how the BTC-2019 crawl was configured. We then present and discuss a variety of statistics that aim to gain insights into the content of BTC-2019. We discuss the hosting of the dataset and the ways in which it can be accessed, remixed and used.
引用
收藏
页码:163 / 180
页数:18
相关论文
共 66 条
[1]  
[Anonymous], 2004, Proceedings of the 13th ACM Conference on Information and Knowledge Management
[2]  
[Anonymous], LINKED DATA WEB LDOW
[3]   Provenance Management for Evolving RDF Datasets [J].
Avgoustaki, Argyro ;
Flouris, Giorgos ;
Fundulaki, Irini ;
Plexousakis, Dimitris .
SEMANTIC WEB: LATEST ADVANCES AND NEW DOMAINS, 2016, 9678 :575-592
[4]  
Balog K., 2011, TEXT RETR C TREC
[5]   The Semantic Web Challenge 2014 [J].
Bechhofer, Sean ;
Harth, Andreas .
JOURNAL OF WEB SEMANTICS, 2015, 35 :141-141
[6]  
Beek W, 2014, LECT NOTES COMPUT SC, V8796, P213, DOI 10.1007/978-3-319-11964-9_14
[7]   The Semantic Web Challenge, 2011 [J].
Bizer, Christian ;
Maynard, Diana .
JOURNAL OF WEB SEMANTICS, 2012, 16 :32-32
[8]   The Semantic Web Challenge, 2010 [J].
Bizer, Christian ;
Maynard, Diana .
JOURNAL OF WEB SEMANTICS, 2011, 9 (03) :315-315
[9]   The Semantic Web Challenge, 2009 [J].
Bizer, Christian ;
Mika, Peter .
JOURNAL OF WEB SEMANTICS, 2010, 8 (04) :341-341
[10]  
Blanco R, 2011, LECT NOTES COMPUT SC, V7031, P83, DOI 10.1007/978-3-642-25073-6_6