Data quality assurance in research data repositories: a theory-guided exploration and model

被引:4
作者
Stvilia, Besiki [1 ]
Lee, Dong Joon [2 ]
机构
[1] Florida State Univ, Sch Informat, Tallahassee, FL 32306 USA
[2] Texas A&M Univ, Mays Business Sch, College Stn, TX USA
关键词
Data quality; Data quality assurance; Research data repositories; Research data curation; Model; INFORMATION; FRAMEWORK;
D O I
10.1108/JD-09-2023-0177
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
PurposeThis study addresses the need for a theory-guided, rich, descriptive account of research data repositories' (RDRs) understanding of data quality and the structures of their data quality assurance (DQA) activities. Its findings can help develop operational DQA models and best practice guides and identify opportunities for innovation in the DQA activities.Design/methodology/approachThe study analyzed 122 data repositories' applications for the Core Trustworthy Data Repositories, interview transcripts of 32 curators and repository managers and data curation-related webpages of their repository websites. The combined dataset represented 146 unique RDRs. The study was guided by a theoretical framework comprising activity theory and an information quality evaluation framework.FindingsThe study provided a theory-based examination of the DQA practices of RDRs summarized as a conceptual model. The authors identified three DQA activities: evaluation, intervention and communication and their structures, including activity motivations, roles played and mediating tools and rules and standards. When defining data quality, study participants went beyond the traditional definition of data quality and referenced seven facets of ethical and effective information systems in addition to data quality. Furthermore, the participants and RDRs referenced 13 dimensions in their DQA models. The study revealed that DQA activities were prioritized by data value, level of quality, available expertise, cost and funding incentives.Practical implicationsThe study's findings can inform the design and construction of digital research data curation infrastructure components on university campuses that aim to provide access not just to big data but trustworthy data. Communities of practice focused on repositories and archives could consider adding FAIR operationalizations, extensions and metrics focused on data quality. The availability of such metrics and associated measurements can help reusers determine whether they can trust and reuse a particular dataset. The findings of this study can help to develop such data quality assessment metrics and intervention strategies in a sound and systematic way.Originality/valueTo the best of the authors' knowledge, this paper is the first data quality theory guided examination of DQA practices in RDRs.
引用
收藏
页码:793 / 812
页数:20
相关论文
共 50 条
  • [21] Open access image repositories: high-quality data to enable machine learning research
    Prior, F.
    Almeida, J.
    Kathiravelu, P.
    Kurc, T.
    Smith, K.
    Fitzgerald, T. J.
    Saltz, J.
    CLINICAL RADIOLOGY, 2020, 75 (01) : 7 - 12
  • [22] Hybrid Theory-Guided Data Driven Framework for Calculating Irrigation Water Use of Three Staple Cereal Crops in China
    Bo, Yong
    Li, Xueke
    Liu, Kai
    Wang, Shudong
    Li, Dehui
    Xu, Yu
    Wang, Mengmeng
    WATER RESOURCES RESEARCH, 2024, 60 (03)
  • [23] ECS: an interactive tool for data quality assurance
    Christian Sieberichs
    Simon Geerkens
    Alexander Braun
    Thomas Waschulzik
    AI and Ethics, 2024, 4 (1): : 131 - 139
  • [24] Quality Assurance for Security Applications of Big Data
    Clarke, Roger
    2016 EUROPEAN INTELLIGENCE AND SECURITY INFORMATICS CONFERENCE (EISIC), 2016, : 1 - 8
  • [25] Research on Data Quality of Data Warehouse
    Liu Shuanghong
    Han Zhongjun
    EBM 2010: INTERNATIONAL CONFERENCE ON ENGINEERING AND BUSINESS MANAGEMENT, VOLS 1-8, 2010, : 5255 - 5258
  • [26] Robust analysis and optimization of a novel efficient quality assurance model in data warehousing
    Amuthabala, P.
    Santhosh, R.
    COMPUTERS & ELECTRICAL ENGINEERING, 2019, 74 : 233 - 244
  • [27] Data quality measurement and assurance in medical registries
    Arts, DGT
    de Keizer, NF
    de Jonge, E
    MEDINFO 2001: PROCEEDINGS OF THE 10TH WORLD CONGRESS ON MEDICAL INFORMATICS, PTS 1 AND 2, 2001, 84 : 404 - 404
  • [28] Dynamic data maintenance for quality data, quality research
    Ozmen-Ertekin, Dilruba
    Ozbay, Kaan
    INTERNATIONAL JOURNAL OF INFORMATION MANAGEMENT, 2012, 32 (03) : 282 - 293
  • [29] Global overview of research data repositories: an analysis of re3data registry
    Khan, Aasif Mohammad
    Loan, Fayaz Ahmad
    Parray, Umer Yousuf
    Rashid, Sozia
    INFORMATION DISCOVERY AND DELIVERY, 2024, 52 (01) : 53 - 61
  • [30] Discovery of Domain Values for Data Quality Assurance
    Ciszak, Lukasz
    DEVELOPING CONCEPTS IN APPLIED INTELLIGENCE, 2011, 363 : 15 - 20