Syntactical Heuristics for the Open Data Quality Assessment and Their Applications

被引:1
|
作者
Pirozzi, Donato [1 ]
Scarano, Vittorio [1 ]
机构
[1] Univ Salerno, Dipartimento Informat, Salerno, Italy
来源
BUSINESS INFORMATION SYSTEMS WORKSHOPS (BIS 2018) | 2019年 / 339卷
关键词
Open data; Quality assessment; Type inferencing;
D O I
10.1007/978-3-030-04849-5_51
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Open Government Data are valuable initiatives in favour of transparency, accountability, and openness. The expectation is to increase participation by engaging citizens, non-profit organisations, and companies in reusing Open Data (OD). A potential barrier in the exploitation of OD and engagement of the target audience is the low quality of available datasets [3, 14, 16]. Non-technical consumers are often unaware that data could have potential quality issues, taking for grant that datasets can be used immediately without any further manipulation. In reality, in order to reuse data, for instance to create visualisations, they need to perform a data clean, which requires time, resources, and proper skills. This leads to a reduced chance to involve citizens. This paper tackles the quality barrier of raw tabular datasets (i.e. CSV), a popular format (Tim-Berners Lee tree-stars) for Governmental Open Data. The objective is to increase awareness and provide support in data cleaning operations to both PAs to produce better quality Open Data and non-technical data consumers to reuse datasets. DataChecker is an open source and modular JavaScript library shared with community and available on GitHub that takes in input a tabular dataset and generate a machine-readable report based on the data type inferencing (a data profiling technique). Based on it the Social Platform for Open Data (SPOD) provides quality cleaning suggestions to both PAs and end-users.
引用
收藏
页码:591 / 602
页数:12
相关论文
共 50 条
  • [21] An Assessment of Open Data Sets Completeness
    Ali, Abdulrazzak
    Emran, Nurul A.
    Asmai, Siti A.
    Ismail, Amelia R.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (06) : 557 - 562
  • [22] Intrinsic and extrinsic quality of data for open data repositories
    Gonzalez-Vidal, Aurora
    Ramallo-Gonzalez, Alfonso P.
    Skarmeta, Antonio F.
    ICT EXPRESS, 2022, 8 (03): : 328 - 333
  • [23] Assessing data quality in Open Data: A case study
    John Ferney, Mahecha Moyano
    Nicolas Estefan, Lopez Beltran
    John Alexander, Velandia Vega
    2017 CONGRESO INTERNACIONAL DE INNOVACION Y TENDENCIAS EN INGENIERIA (CONIITI), 2017,
  • [24] Model-driven Engineering IDE for Quality Assessment of Data-intensive Applications
    Gil, Marc
    Joubert, Christophe
    Torres, Ismael
    ICPE'17: COMPANION OF THE 2017 ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING, 2017, : 173 - 174
  • [25] Trust in open data applications through transparency
    Wiencierz, Christian
    Luenich, Marco
    NEW MEDIA & SOCIETY, 2022, 24 (08) : 1751 - 1770
  • [26] OPEN DATA FOR TERRITORIAL SPECIALIZATION ASSESSMENT TERRITORIAL SPECIALIZATION IN ATTRACTING LOCAL DEVELOPMENT FUNDS: AN ASSESSMENT PROCEDURE BASED ON OPEN DATA AND OPEN TOOLS
    Casas, Giuseppe Las
    Lombard, Silvana
    Murgante, Beniamino
    Pontrandolfi, Piergiuseppe
    Scorza, Francesco
    TEMA-JOURNAL OF LAND USE MOBILITY AND ENVIRONMENT, 2014, : 581 - 595
  • [27] Quality and maturity model for open data portals
    Oviedo, Edgar
    Norberto Mazon, Jose
    Jacobo Zubcoff, Jose
    2015 XLI LATIN AMERICAN COMPUTING CONFERENCE (CLEI), 2015, : 457 - 462
  • [28] Agile Production of High Quality Open Data
    De Donato, Renato
    Ferretti, Giuseppe
    Marciano, Antonio
    Palmieri, Giuseppina
    Pirozzi, Donato
    Scarano, Vittorio
    Vicidomini, Luca
    PROCEEDINGS OF THE 19TH ANNUAL INTERNATIONAL CONFERENCE ON DIGITAL GOVERNMENT RESEARCH (DGO 2018): GOVERNANCE IN THE DATA AGE, 2018, : 718 - 727
  • [29] Quality Issues of Public Procurement Open Data
    Csaki, Csaba
    Prier, Eric
    ELECTRONIC GOVERNMENT AND THE INFORMATION SYSTEMS PERSPECTIVE, EGOVIS 2018, 2018, 11032 : 177 - 191
  • [30] Proposal to Measure the Quality of Open Data Sets
    Mendez Matamoros, Jorge Hernando
    Rodriguez Rojas, Luz Andrea
    Tarazona Bermudez, Giovanny Mauricio
    KNOWLEDGE MANAGEMENT IN ORGANIZATIONS, KMO 2018, 2018, 877 : 701 - 709