Data quality assurance in research data repositories: a theory-guided exploration and model

被引:4
作者
Stvilia, Besiki [1 ]
Lee, Dong Joon [2 ]
机构
[1] Florida State Univ, Sch Informat, Tallahassee, FL 32306 USA
[2] Texas A&M Univ, Mays Business Sch, College Stn, TX USA
关键词
Data quality; Data quality assurance; Research data repositories; Research data curation; Model; INFORMATION; FRAMEWORK;
D O I
10.1108/JD-09-2023-0177
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
PurposeThis study addresses the need for a theory-guided, rich, descriptive account of research data repositories' (RDRs) understanding of data quality and the structures of their data quality assurance (DQA) activities. Its findings can help develop operational DQA models and best practice guides and identify opportunities for innovation in the DQA activities.Design/methodology/approachThe study analyzed 122 data repositories' applications for the Core Trustworthy Data Repositories, interview transcripts of 32 curators and repository managers and data curation-related webpages of their repository websites. The combined dataset represented 146 unique RDRs. The study was guided by a theoretical framework comprising activity theory and an information quality evaluation framework.FindingsThe study provided a theory-based examination of the DQA practices of RDRs summarized as a conceptual model. The authors identified three DQA activities: evaluation, intervention and communication and their structures, including activity motivations, roles played and mediating tools and rules and standards. When defining data quality, study participants went beyond the traditional definition of data quality and referenced seven facets of ethical and effective information systems in addition to data quality. Furthermore, the participants and RDRs referenced 13 dimensions in their DQA models. The study revealed that DQA activities were prioritized by data value, level of quality, available expertise, cost and funding incentives.Practical implicationsThe study's findings can inform the design and construction of digital research data curation infrastructure components on university campuses that aim to provide access not just to big data but trustworthy data. Communities of practice focused on repositories and archives could consider adding FAIR operationalizations, extensions and metrics focused on data quality. The availability of such metrics and associated measurements can help reusers determine whether they can trust and reuse a particular dataset. The findings of this study can help to develop such data quality assessment metrics and intervention strategies in a sound and systematic way.Originality/valueTo the best of the authors' knowledge, this paper is the first data quality theory guided examination of DQA practices in RDRs.
引用
收藏
页码:793 / 812
页数:20
相关论文
共 50 条
  • [1] Developing a data quality assurance ontology for research data repositories
    Lee, Dong Joon
    Stvilia, Besiki
    Gunaydin, Fatih
    Pang, Yuanying
    JOURNAL OF DOCUMENTATION, 2025, 81 (07) : 63 - 84
  • [2] Data Quality Assurance at Research Data Repositories
    Kindling M.
    Strecker D.
    Data Science Journal, 2022, 21 (01):
  • [3] Data quality assurance practices in research data repositories-A systematic literature review
    Stvilia, Besiki
    Pang, Yuanying
    Lee, Dong Joon
    Gunaydin, Fatih
    JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2025, 76 (01) : 238 - 261
  • [4] Data avatars: A theory-guided design and assessment for multidimensional data visualization
    Pflughoeft, Kurt A.
    Zahedi, Fatemeh Mariam
    Chen, Yan
    INFORMATION & MANAGEMENT, 2024, 61 (02)
  • [5] Data Quality Issues and Content Analysis for Research Data Repositories: The Case of Dryad
    Rousidis, Dimitris
    Garoufallou, Emmanouel
    Balatsoukas, Panos
    Sicilia, Miguel-Angel
    LET'S PUT DATA TO USE: DIGITAL SCHOLARSHIP FOR THE NEXT GENERATION, 2014, : 49 - 58
  • [6] Research of data quality assurance about ETL of telecom data warehouse
    Wei, S., 1839, Asian Network for Scientific Information (12): : 1839 - 1844
  • [8] A Quality Model for Linked Data Exploration
    Cappiello, Cinzia
    Di Noia, Tommaso
    Marcu, Bogdan Alexandru
    Matera, Maristella
    WEB ENGINEERING (ICWE 2016), 2016, 9671 : 397 - 404
  • [9] Predicting reward-based crowdfunding success with multimodal data: A theory-guided framework
    Bao, Liqian
    Chen, Gang
    Liu, Zongxi
    Xiao, Shuaiyong
    Zhao, Huimin
    INFORMATION & MANAGEMENT, 2025, 62 (04)
  • [10] Intrinsic and extrinsic quality of data for open data repositories
    Gonzalez-Vidal, Aurora
    Ramallo-Gonzalez, Alfonso P.
    Skarmeta, Antonio F.
    ICT EXPRESS, 2022, 8 (03): : 328 - 333