Validating the extract, transform, load process used to populate a large clinical research database

被引:36
作者
Denney, Michael J. [1 ]
Long, Dustin M. [2 ]
Armistead, Matthew G. [1 ]
Anderson, Jamie L. [3 ]
Conway, Baqiyyah N. [4 ]
机构
[1] West Virginia Clin & Translat Sci Inst, Biomed Informat, Morgantown, WV USA
[2] West Virginia Univ, Dept Biostat, Morgantown, WV USA
[3] West Virginia Univ Healthcare, Dept Hlth Informat Management, Morgantown, WV USA
[4] West Virginia Univ, Dept Epidemiol, Morgantown, WV 26506 USA
基金
美国国家卫生研究院;
关键词
Correctness; Clinical data warehouse; Electronic health record; Extract transform load; Informatics;
D O I
10.1016/j.ijmedinf.2016.07.009
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Background: Informaticians at any institution that are developing clinical research support infrastructure are tasked with populating research databases with data extracted and transformed from their institution's operational databases, such as electronic health records (EHRs). These data must be properly extracted from these source systems, transformed into a standard data structure, and then loaded into the data warehouse while maintaining the integrity of these data. We validated the correctness of the extract, load, and transform (ETL) process of the extracted data of West Virginia Clinical and Translational Science Institute's Integrated Data Repository, a clinical data warehouse that includes data extracted from two EHR systems. Methods: Four hundred ninety-eight observations were randomly selected from the integrated data repository and compared with the two source EHR systems. Results: Of the 498 observations, there were 479 concordant and 19 discordant observations. The discordant observations fell into three general categories: a) design decision differences between the IDR and source EHRs, b) timing differences, and c) user interface settings. After resolving apparent discordances, our integrated data repository was found to be 100% accurate relative to its source EHR systems. Conclusion: Any institution that uses a clinical data warehouse that is developed based on extraction processes from operational databases, such as EHRs, employs some form of an ETL process. As secondary use of EHR data begins to transform the research landscape, the importance of the basic validation of the extracted EHR data cannot be underestimated and should start with the validation of the extraction process itself. (C) 2016 Elsevier Ireland Ltd. All rights reserved.
引用
收藏
页码:271 / 274
页数:4
相关论文
共 8 条
  • [1] Collen MF, 2015, HEALTH INFORM SER, P1, DOI 10.1007/978-1-4471-6732-7
  • [2] Grandia L., 2016, HEALTHCARE INFORM SY
  • [3] Juran J.M. ., 1988, JURANS QUALITY CONTR, V4th
  • [4] Validity of The Health Improvement Network (THIN) for epidemiologic studies of hepatitis C virus infection
    Lo Re, Vincent, III
    Haynes, Kevin
    Forde, Kimberly A.
    Localio, A. Russell
    Schinnar, Rita
    Lewis, James D.
    [J]. PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2009, 18 (09) : 807 - 814
  • [5] Logan JR, 2001, J AM MED INFORM ASSN, P408
  • [6] Validity of The Health Improvement Network (THIN) for the study of psoriasis
    Seminara, N. M.
    Abuabara, K.
    Shin, D. B.
    Langan, S. M.
    Kimmel, S. E.
    Margolis, D.
    Troxel, A. B.
    Gelfand, J. M.
    [J]. BRITISH JOURNAL OF DERMATOLOGY, 2011, 164 (03) : 602 - 609
  • [7] VANDERLEI J, 1991, METHOD INFORM MED, V30, P79
  • [8] Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research
    Weiskopf, Nicole Gray
    Weng, Chunhua
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2013, 20 (01) : 144 - 151