Roomba: An Extensible Framework to Validate and Build Dataset Profiles

被引:12
作者
Assaf, Ahmad [1 ,2 ]
Troncy, Raphael [1 ]
Senart, Aline [2 ]
机构
[1] EURECOM, Sophia Antipolis, France
[2] SAP Labs France, Mougins, France
来源
SEMANTIC WEB: ESWC 2015 SATELLITE EVENTS | 2015年 / 9341卷
关键词
Linked data; Dataset profile; Metadata; Data quality;
D O I
10.1007/978-3-319-25639-9_46
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Linked Open Data (LOD) has emerged as one of the largest collections of interlinked datasets on the web. In order to benefit from this mine of data, one needs to access to descriptive information about each dataset (or metadata). This information can be used to delay data entropy, enhance dataset discovery, exploration and reuse as well as helping data portal administrators in detecting and eliminating spam. However, such metadata information is currently very limited to a few data portals where they are usually provided manually, thus being often incomplete and inconsistent in terms of quality. To address these issues, we propose a scalable automatic approach for extracting, validating, correcting and generating descriptive linked dataset profiles. This approach applies several techniques in order to check the validity of the metadata provided and to generate descriptive and statistical information for a particular dataset or for an entire data portal.
引用
收藏
页码:325 / 339
页数:15
相关论文
共 29 条
[1]  
Abedjan Z, 2014, PROC INT CONF DATA, P1198, DOI 10.1109/ICDE.2014.6816740
[2]  
[Anonymous], 2012, PROC 21 ACM INT C IN
[3]  
Auer Soren, 2012, Knowledge Engineering and Knowledge Management. 18th International Conference, EKAW 2012. Proceedings, P353, DOI 10.1007/978-3-642-33876-2_31
[4]   Linked Data - The Story So Far [J].
Bizer, Christian ;
Heath, Tom ;
Berners-Lee, Tim .
INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS, 2009, 5 (03) :1-22
[5]  
Bizer C, 2011, LECT NOTES COMPUT SC, V7051, P1, DOI 10.1007/978-3-642-24577-0_1
[6]   Creating voiD descriptions for Web-scale data [J].
Boehm, Christoph ;
Lorey, Johannes ;
Naumann, Felix .
JOURNAL OF WEB SEMANTICS, 2011, 9 (03) :339-345
[7]  
Bohm C., 2010, 26 INT C DAT ENG WOR
[8]  
Boyd D., 2011, DEC INT TIM S DVN IN
[9]  
Cornolti M., 2013, 22 WORLD WID WEB C W
[10]  
Cyganiak R., 2011, Describing Linked Datasets with the VolD Vocabulary, W3C Interest Group Note, W3C