Syntactical Heuristics for the Open Data Quality Assessment and Their Applications

被引:1
|
作者
Pirozzi, Donato [1 ]
Scarano, Vittorio [1 ]
机构
[1] Univ Salerno, Dipartimento Informat, Salerno, Italy
来源
BUSINESS INFORMATION SYSTEMS WORKSHOPS (BIS 2018) | 2019年 / 339卷
关键词
Open data; Quality assessment; Type inferencing;
D O I
10.1007/978-3-030-04849-5_51
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Open Government Data are valuable initiatives in favour of transparency, accountability, and openness. The expectation is to increase participation by engaging citizens, non-profit organisations, and companies in reusing Open Data (OD). A potential barrier in the exploitation of OD and engagement of the target audience is the low quality of available datasets [3, 14, 16]. Non-technical consumers are often unaware that data could have potential quality issues, taking for grant that datasets can be used immediately without any further manipulation. In reality, in order to reuse data, for instance to create visualisations, they need to perform a data clean, which requires time, resources, and proper skills. This leads to a reduced chance to involve citizens. This paper tackles the quality barrier of raw tabular datasets (i.e. CSV), a popular format (Tim-Berners Lee tree-stars) for Governmental Open Data. The objective is to increase awareness and provide support in data cleaning operations to both PAs to produce better quality Open Data and non-technical data consumers to reuse datasets. DataChecker is an open source and modular JavaScript library shared with community and available on GitHub that takes in input a tabular dataset and generate a machine-readable report based on the data type inferencing (a data profiling technique). Based on it the Social Platform for Open Data (SPOD) provides quality cleaning suggestions to both PAs and end-users.
引用
收藏
页码:591 / 602
页数:12
相关论文
共 50 条
  • [31] QUALITY ANALYSIS OF OPEN STREET MAP DATA
    Wang Ming
    Li Qingquan
    Hu Qingwu
    Zhou Meng
    8TH INTERNATIONAL SYMPOSIUM ON SPATIAL DATA QUALITY, 2013, 40-2 (w1): : 155 - 158
  • [32] A Privacy Risk Assessment Model for Open Data
    Ali-Eldin, Amr
    Zuiderwijk, Anneke
    Janssen, Marijn
    BUSINESS MODELING AND SOFTWARE DESIGN, BMSD 2017, 2018, 309 : 186 - 201
  • [33] Open Data Assessment in Italian and Spanish Cities
    Sisto, Raffaele
    Garcia Lopez, Javier
    Manuel Paez, Jose
    Mate Mugica, Elena
    SMART AND SUSTAINABLE PLANNING FOR CITIES AND REGIONS, SSPCR 2017, 2018, : 121 - 131
  • [34] Data-oriented QMOOD model for quality assessment of multi-client software applications
    Ozcevik, Yusuf
    ENGINEERING SCIENCE AND TECHNOLOGY-AN INTERNATIONAL JOURNAL-JESTECH, 2024, 51
  • [35] Quality Assessment and Biases in Reused Data
    Fernandez-Ardevo, Mireia
    Rosales, Andrea
    AMERICAN BEHAVIORAL SCIENTIST, 2024, 68 (05) : 696 - 710
  • [36] Customized Quality Assessment of Healthcare Data
    Shin, Jieun
    Kim, Jong-Yeup
    ANNALS OF LABORATORY MEDICINE, 2024, 44 (06) : 472 - 477
  • [37] Linked Data Quality Assessment: A Survey
    Nayak, Aparna
    Bozic, Bojan
    Longo, Luca
    WEB SERVICES - ICWS 2021, 2022, 12994 : 63 - 76
  • [38] Quality Assessment of Imputations in Administrative Data
    Schnetzer, Matthias
    Astleithner, Franz
    Cetkovic, Predrag
    Humer, Stefan
    Lenk, Manuela
    Moser, Mathias
    JOURNAL OF OFFICIAL STATISTICS, 2015, 31 (02) : 231 - 247
  • [39] Geospatial Future Is Open: Lessons Learnt from Applications Based on Open Data
    Cuca, Branka
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2016, PT III, 2016, 9788 : 491 - 502
  • [40] Enhancing Visualization Applications Using Open Data Sources
    Suwanworaboon, Ponlakit
    Lynden, Steven
    Tuarob, Suppawong
    2020 17TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE), 2020, : 30 - 35