Applying probabilistic temporal and multisite data quality control methods to a public health mortality registry in Spain: a systematic approach to quality control of repositories

被引:25
|
作者
Saez, Carlos [1 ,2 ]
Zurriaga, Oscar [3 ,4 ,5 ]
Perez-Panades, Jordi [3 ]
Melchor, Inma [3 ]
Robles, Montserrat [1 ]
Garcia-Gomez, Juan M. [1 ,6 ]
机构
[1] Univ Politecn Valencia, Inst Univ Aplicac Tecnols Informac & Comunicac Av, Camino Vera S-N, Valencia 46022, Spain
[2] Univ Porto, Ctr Hlth Technol & Serv Res, Oporto, Portugal
[3] Conselleria Sanidad, Direcc Gen Salud Publ, Valencia, Spain
[4] Conselleria Sanidad, FISABIO Salud Publ, Valencia, Spain
[5] CIBERESP, Madrid, Spain
[6] Hosp Univ Politecn La Fe, Unidad Mixta Invest TICs Aplicadas Reingenieria P, Inst Invest Sanitaria, Valencia, Spain
关键词
data reuse; multisite repositories; data quality; data monitoring; statistical data analysis; data mining; MUTUAL INFORMATION; VARIABILITY; MANAGEMENT; MODEL;
D O I
10.1093/jamia/ocw010
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective To assess the variability in data distributions among data sources and over time through a case study of a large multisite repository as a systematic approach to data quality (DQ). Materials and Methods Novel probabilistic DQ control methods based on information theory and geometry are applied to the Public Health Mortality Registry of the Region of Valencia, Spain, with 512 143 entries from 2000 to 2012, disaggregated into 24 health departments. The methods provide DQ metrics and exploratory visualizations for (1) assessing the variability among multiple sources and (2) monitoring and exploring changes with time. The methods are suited to big data and multitype, multivariate, and multimodal data. Results The repository was partitioned into 2 probabilistically separated temporal subgroups following a change in the Spanish National Death Certificate in 2009. Punctual temporal anomalies were noticed due to a punctual increment in the missing data, along with outlying and clustered health departments due to differences in populations or in practices. Discussion Changes in protocols, differences in populations, biased practices, or other systematic DQ problems affected data variability. Even if semantic and integration aspects are addressed in data sharing infrastructures, probabilistic variability may still be present. Solutions include fixing or excluding data and analyzing different sites or time periods separately. A systematic approach to assessing temporal and multisite variability is proposed. Conclusion Multisite and temporal variability in data distributions affects DQ, hindering data reuse, and an assessment of such variability should be a part of systematic DQ procedures.
引用
收藏
页码:1085 / 1095
页数:11
相关论文
共 13 条
  • [1] Applying Control Chart Methods to Enhance Data Quality
    Jones-Farmer, L. Allison
    Ezell, Jeremy D.
    Hazen, Benjamin T.
    TECHNOMETRICS, 2014, 56 (01) : 29 - 41
  • [2] Computationally Assisted Quality Control for Public Health Data Streams
    Joshi, Ananya
    Mazaitis, Kathryn
    Rosenfeld, Roni
    Wilder, Bryan
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 6004 - 6012
  • [3] Quality control of fighting fish nucleotide sequences in public repositories reveals a dark matter of systematic taxonomic implication
    Thitipong Panthum
    Nattakan Ariyaphong
    Pish Wattanadilokchatkun
    Worapong Singchat
    Syed Farhan Ahmad
    Ekaphan Kraichak
    Sahabhop Dokkaew
    Narongrit Muangmai
    Kyudong Han
    Prateep Duengkae
    Kornsorn Srikulnath
    Genes & Genomics, 2023, 45 : 169 - 181
  • [4] Automated quality control methods for sensor data: a novel observatory approach
    Taylor, J. R.
    Loescher, H. L.
    BIOGEOSCIENCES, 2013, 10 (07) : 4957 - 4971
  • [5] Quality control of fighting fish nucleotide sequences in public repositories reveals a dark matter of systematic taxonomic implication
    Panthum, Thitipong
    Ariyaphong, Nattakan
    Wattanadilokchatkun, Pish
    Singchat, Worapong
    Ahmad, Syed Farhan
    Kraichak, Ekaphan
    Dokkaew, Sahabhop
    Muangmai, Narongrit
    Han, Kyudong
    Duengkae, Prateep
    Srikulnath, Kornsorn
    GENES & GENOMICS, 2023, 45 (02) : 169 - 181
  • [6] Strategies for quality control in gynecologic cytology diagnosis: Performance of 3 methods in a public health laboratory
    de Castro Ferraz, M. G. Mattosinho
    Utagawa, M. L.
    Longatto Filho, A.
    Shirata, N. K.
    di Loreto, C.
    Agnol, M. D.
    ACTA CYTOLOGICA, 2007, 51 (02) : 326 - 326
  • [7] PERFORMANCE IMPROVEMENT IN HEALTH-CARE ORGANIZATIONS - VARIABILITY IN CLINICAL-SYSTEMS - APPLYING MODERN QUALITY-CONTROL METHODS TO HEALTH-CARE
    BANKS, NJ
    PALMER, RH
    BERWICK, DM
    PLSEK, P
    JOINT COMMISSION JOURNAL ON QUALITY IMPROVEMENT, 1995, 21 (08): : 407 - 417
  • [8] Surveillance of SARS-CoV-2 RNA in wastewater: Methods optimization and quality control are crucial for generating reliable public health information
    Ahmed, Warish
    Bivins, Aaron
    Bertsch, Paul M.
    Bibby, Kyle
    Choi, Phil M.
    Farkas, Kata
    Gyawali, Pradip
    Hamilton, Kerry A.
    Haramoto, Eiji
    Kitajima, Masaaki
    Simpson, Stuart L.
    Tandukar, Sarmila
    Thomas, Kevin V.
    Mueller, Jochen F.
    CURRENT OPINION IN ENVIRONMENTAL SCIENCE & HEALTH, 2020, 17 : 82 - 93
  • [9] Health impact of China's Air Pollution Prevention and Control Action Plan: an analysis of national air quality monitoring and mortality data
    Huang, Jing
    Pan, Xiaochuan
    Guo, Xinbiao
    Li, Guoxing
    LANCET PLANETARY HEALTH, 2018, 2 (07): : E313 - E323
  • [10] Advanced Data Systems for Energy Consumption Optimization and Air Quality Control in Smart Public Buildings Using a Versatile Open Source Approach
    Starace, Giuseppe
    Tiwari, Amber
    Colangelo, Gianpiero
    Massaro, Alessandro
    ELECTRONICS, 2022, 11 (23)