Syntactical Heuristics for the Open Data Quality Assessment and Their Applications

被引:1
|
作者
Pirozzi, Donato [1 ]
Scarano, Vittorio [1 ]
机构
[1] Univ Salerno, Dipartimento Informat, Salerno, Italy
来源
BUSINESS INFORMATION SYSTEMS WORKSHOPS (BIS 2018) | 2019年 / 339卷
关键词
Open data; Quality assessment; Type inferencing;
D O I
10.1007/978-3-030-04849-5_51
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Open Government Data are valuable initiatives in favour of transparency, accountability, and openness. The expectation is to increase participation by engaging citizens, non-profit organisations, and companies in reusing Open Data (OD). A potential barrier in the exploitation of OD and engagement of the target audience is the low quality of available datasets [3, 14, 16]. Non-technical consumers are often unaware that data could have potential quality issues, taking for grant that datasets can be used immediately without any further manipulation. In reality, in order to reuse data, for instance to create visualisations, they need to perform a data clean, which requires time, resources, and proper skills. This leads to a reduced chance to involve citizens. This paper tackles the quality barrier of raw tabular datasets (i.e. CSV), a popular format (Tim-Berners Lee tree-stars) for Governmental Open Data. The objective is to increase awareness and provide support in data cleaning operations to both PAs to produce better quality Open Data and non-technical data consumers to reuse datasets. DataChecker is an open source and modular JavaScript library shared with community and available on GitHub that takes in input a tabular dataset and generate a machine-readable report based on the data type inferencing (a data profiling technique). Based on it the Social Platform for Open Data (SPOD) provides quality cleaning suggestions to both PAs and end-users.
引用
收藏
页码:591 / 602
页数:12
相关论文
共 50 条
  • [41] Big plus Open Data: Some Applications for a Smartcity
    Lopez, Victoria
    Minana, Guadalupe
    Sanchez, Oscar
    Gonzalez, Beatriz
    Valverde, Gabriel
    Caro, R.
    PROCEEDINGS OF 2015 IEEE INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATCS AND COMPUTING (IEEE PIC), 2015, : 384 - 389
  • [42] Improving open data quality through citizen engagement and data engineering
    Garcia Saez, Cesar
    PROCEEDINGS OF THE 18TH INTERNATIONAL SYMPOSIUM ON OPEN COLLABORATION, OPENSYM 2022, 2022,
  • [43] Quality Assessment with Deep Learning for Imaging Applications
    Voronin, V.
    Zelensky, A.
    Zhdanova, M.
    Semenishchev, E.
    Frantc, V
    Siryakov, A.
    MULTIMODAL IMAGE EXPLOITATION AND LEARNING 2022, 2022, 12100
  • [44] Quality assessment framework for open government data Meta-synthesis of qualitative research, 2009-2019
    Zhang, Hui
    Xiao, Jianying
    ELECTRONIC LIBRARY, 2020, 38 (02) : 209 - 222
  • [45] Quality assessment of coupled civil engineering applications
    Froebel, Toni
    Firmenich, Berthold
    Koch, Christian
    ADVANCED ENGINEERING INFORMATICS, 2011, 25 (04) : 625 - 639
  • [46] Evaluating the Quality of Open Data Portals on the National Level
    Machova, Renata
    Lnenicka, Martin
    JOURNAL OF THEORETICAL AND APPLIED ELECTRONIC COMMERCE RESEARCH, 2017, 12 (01): : 21 - 41
  • [47] Open Government Data: An Assessment of the Spanish Municipal Situation
    Carrasco, Carlos
    Sobrepere, Xavier
    SOCIAL SCIENCE COMPUTER REVIEW, 2015, 33 (05) : 631 - 644
  • [48] A Framework for Evaluation and Improvement of Open Government Data Quality: Application to the Western Balkans National Open Data Portals
    Raca, Vigan
    Velinov, Goran
    Dzalev, Stefan
    Kon-Popovska, Margita
    SAGE OPEN, 2022, 12 (02):
  • [49] Colombian Case Study for the Analysis of Open Data Government: a Data Quality Approach
    Osorio Sanabria, Mariutsi Alexandra
    Amaya Fernandez, Ferney Orlando
    Gonzalez Zabala, Mayda Patricia
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON THEORY AND PRACTICE OF ELECTRONIC GOVERNANCE (ICEGOV2018), 2018, : 389 - 394
  • [50] Towards Publishing Ontology-Based Data Quality Metadata of Open Data
    Esnaola-Gonzalez, Iker
    ARTIFICIAL INTELLIGENCE XXXVIII, 2021, 13101 : 371 - 376