Syntactical Heuristics for the Open Data Quality Assessment and Their Applications

被引:2
作者
Pirozzi, Donato [1 ]
Scarano, Vittorio [1 ]
机构
[1] Univ Salerno, Dipartimento Informat, Salerno, Italy
来源
BUSINESS INFORMATION SYSTEMS WORKSHOPS (BIS 2018) | 2019年 / 339卷
关键词
Open data; Quality assessment; Type inferencing;
D O I
10.1007/978-3-030-04849-5_51
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Open Government Data are valuable initiatives in favour of transparency, accountability, and openness. The expectation is to increase participation by engaging citizens, non-profit organisations, and companies in reusing Open Data (OD). A potential barrier in the exploitation of OD and engagement of the target audience is the low quality of available datasets [3, 14, 16]. Non-technical consumers are often unaware that data could have potential quality issues, taking for grant that datasets can be used immediately without any further manipulation. In reality, in order to reuse data, for instance to create visualisations, they need to perform a data clean, which requires time, resources, and proper skills. This leads to a reduced chance to involve citizens. This paper tackles the quality barrier of raw tabular datasets (i.e. CSV), a popular format (Tim-Berners Lee tree-stars) for Governmental Open Data. The objective is to increase awareness and provide support in data cleaning operations to both PAs to produce better quality Open Data and non-technical data consumers to reuse datasets. DataChecker is an open source and modular JavaScript library shared with community and available on GitHub that takes in input a tabular dataset and generate a machine-readable report based on the data type inferencing (a data profiling technique). Based on it the Social Platform for Open Data (SPOD) provides quality cleaning suggestions to both PAs and end-users.
引用
收藏
页码:591 / 602
页数:12
相关论文
共 23 条
[1]   Protection and Preservation of Campania Cultural Heritage engaging local communities via the use of Open Data [J].
Ambrosino, Maria Anna ;
Andriessen, Jerry ;
Annunziata, Vanja ;
De Santo, Massimo ;
Luciano, Carmela ;
Pardijs, Mirjam ;
Pirozzi, Donato ;
Santangelo, Gianluca .
PROCEEDINGS OF THE 19TH ANNUAL INTERNATIONAL CONFERENCE ON DIGITAL GOVERNMENT RESEARCH (DGO 2018): GOVERNANCE IN THE DATA AGE, 2018, :428-435
[2]  
Andriessen J, 2017, INT CONF EDEMOC EGOV, P47, DOI 10.1109/ICEDEG.2017.7962512
[3]   Open Data Hopes and Fears Determining the barriers of Open Data [J].
Beno, Martin ;
Figl, Kathrin ;
Umbrich, Juergen ;
Polleres, Axel .
2017 7TH INTERNATIONAL CONFERENCE FOR E-DEMOCRACY AND OPEN GOVERNMENT (CEDEM), 2017, :69-81
[4]  
Berners-Lee T., 2006, Linked Data - Design issues
[5]  
Castro D., 2015, Open data in the G8: A review of progress on the open data charter
[6]  
Commission E., 2017, OP DAT MAT EUR
[7]  
Commission E, 2017, RE US OP DAT
[8]  
Commission E, 2017, OP DAT PORT
[9]   Engaging Citizens with a Social Platform for Open Data [J].
Cordasco, Gennaro ;
De Donato, Renato ;
Malandrino, Delfina ;
Palmieri, Giuseppina ;
Petta, Andrea ;
Pirozzi, Donato ;
Santangelo, Gianluca ;
Scarano, Vittorio ;
Serra, Luigi ;
Spagnuolo, Carmine ;
Vicidomini, Luca .
DG.O 2017: THE PROCEEDINGS OF THE 18TH ANNUAL INTERNATIONAL CONFERENCE ON DIGITAL GOVERNMENT RESEARCH: INNOVATIONS AND TRANSFORMATIONS IN GOVERNMENT, 2017, :242-249
[10]  
Dawes SS, 2010, LECT NOTES COMPUT SC, V6228, P50, DOI 10.1007/978-3-642-14799-9_5