Syntactical Heuristics for the Open Data Quality Assessment and Their Applications

被引:1
|
作者
Pirozzi, Donato [1 ]
Scarano, Vittorio [1 ]
机构
[1] Univ Salerno, Dipartimento Informat, Salerno, Italy
来源
BUSINESS INFORMATION SYSTEMS WORKSHOPS (BIS 2018) | 2019年 / 339卷
关键词
Open data; Quality assessment; Type inferencing;
D O I
10.1007/978-3-030-04849-5_51
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Open Government Data are valuable initiatives in favour of transparency, accountability, and openness. The expectation is to increase participation by engaging citizens, non-profit organisations, and companies in reusing Open Data (OD). A potential barrier in the exploitation of OD and engagement of the target audience is the low quality of available datasets [3, 14, 16]. Non-technical consumers are often unaware that data could have potential quality issues, taking for grant that datasets can be used immediately without any further manipulation. In reality, in order to reuse data, for instance to create visualisations, they need to perform a data clean, which requires time, resources, and proper skills. This leads to a reduced chance to involve citizens. This paper tackles the quality barrier of raw tabular datasets (i.e. CSV), a popular format (Tim-Berners Lee tree-stars) for Governmental Open Data. The objective is to increase awareness and provide support in data cleaning operations to both PAs to produce better quality Open Data and non-technical data consumers to reuse datasets. DataChecker is an open source and modular JavaScript library shared with community and available on GitHub that takes in input a tabular dataset and generate a machine-readable report based on the data type inferencing (a data profiling technique). Based on it the Social Platform for Open Data (SPOD) provides quality cleaning suggestions to both PAs and end-users.
引用
收藏
页码:591 / 602
页数:12
相关论文
共 50 条
  • [1] Automated Quality Assessment of Metadata across Open Data Portals
    Neumaier, Sebastian
    Umbrich, Jurgen
    Polleres, Axel
    ACM JOURNAL OF DATA AND INFORMATION QUALITY, 2016, 8 (01):
  • [2] Quality Assessment for Open Government Data in China
    Li, Xiao-Tong
    Zhai, Jun
    Zheng, Gui-Fu
    Yuan, Chang-Feng
    ICIME 2018: PROCEEDINGS OF THE 2018 10TH INTERNATIONAL CONFERENCE ON INFORMATION MANAGEMENT AND ENGINEERING, 2018, : 110 - 114
  • [3] Data Quality Assessment on Taiwan's Open Data Sites
    Lin, Cathy S.
    Yang, Hsin-Chang
    MULTIDISCIPLINARY SOCIAL NETWORKS RESEARCH, MISNC 2014, 2014, 473 : 325 - 333
  • [4] An Assessment of the Quality of Open Government Data in Saudi Arabia
    Alogaiel, Nada Faisal
    Alrwais, Omer Abdulaziz
    IEEE ACCESS, 2023, 11 : 61560 - 61599
  • [5] Importance of the Open Data Assessment: An Insight Into the (Meta) Data Quality Dimensions
    Slibar, Barbara
    Oreski, Dijana
    Redep, Nina Begicevic
    SAGE OPEN, 2021, 11 (02):
  • [6] Access Control and Quality Attributes of Open Data: Applications and Techniques
    Karafili, Erisa
    Spanaki, Konstantina
    Lupu, Emil C.
    BUSINESS INFORMATION SYSTEMS WORKSHOPS (BIS 2018), 2019, 339 : 603 - 614
  • [7] Interoperability-oriented Quality Assessment for Czech Open Data
    Kusnirakova, Dasa
    Ge, Mouzhi
    Walletzky, Leonard
    Buhnova, Barbora
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON DATA SCIENCE, TECHNOLOGY AND APPLICATIONS (DATA), 2022, : 446 - 453
  • [8] Systematic Literature Review of Data Quality in Open Government Data: Trend, Methods, and Applications
    Zainuddin, Zahirah
    Akhir, Emelia Akashah P.
    IEEE ACCESS, 2024, 12 : 148466 - 148487
  • [9] A Metrics-Driven Approach for Quality Assessment of Linked Open Data
    Behkamal, Behshid
    Kahani, Mohsen
    Bagheri, Ebrahim
    Jeremic, Zoran
    JOURNAL OF THEORETICAL AND APPLIED ELECTRONIC COMMERCE RESEARCH, 2014, 9 (02): : 64 - 79
  • [10] Open risk assessment: data
    Gilsenan, Mary B.
    Abbinante, Fabrizio
    O'Dea, Eileen
    Canals, Ana
    Tritscher, Angelika
    EFSA JOURNAL, 2016, 14