Combining structured and unstructured data in EMRs to create clinically-defined EMR-derived cohorts

被引:11
作者
Tam, Charmaine S. [1 ,2 ,13 ,15 ]
Gullick, Janice [3 ]
Saavedra, Aldo [1 ,4 ]
Vernon, Stephen T. [5 ,6 ]
Figtree, Gemma A. [2 ,5 ,6 ]
Chow, Clara K. [7 ,8 ]
Cretikos, Michelle [10 ]
Morris, Richard W. [1 ,2 ]
William, Maged [11 ,12 ]
Morris, Jonathan [2 ,9 ]
Brieger, David [14 ]
机构
[1] Univ Sydney, Ctr Translat Data Sci, Sydney, NSW, Australia
[2] Univ Sydney, Northern Clin Sch, Sydney, NSW, Australia
[3] Univ Sydney, Susan Wakil Sch Nursing & Midwifery, Sydney, NSW, Australia
[4] Univ Sydney, Fac Hlth Sci, Sydney, NSW, Australia
[5] Royal North Shore Hosp, Northern Sydney Local Hlth Dist, Kolling Inst Med Res, Cardiothorac & Vasc Hlth, Sydney, NSW, Australia
[6] Royal North Shore Hosp, Northern Sydney Local Hlth Dist, Dept Cardiol, Sydney, NSW, Australia
[7] Univ Sydney, Westmead Appl Res Ctr, Sydney, NSW, Australia
[8] Westmead Hosp, Dept Cardiol, Sydney, NSW, Australia
[9] Northern Sydney Local Hlth Dist, Clin & Populat Perinatal Hlth, Sydney, NSW, Australia
[10] NSW Minist Hlth, Ctr Populat Hlth, Sydney, NSW, Australia
[11] Cent Coast Local Hlth Dist, Dept Cardiol, Sydney, NSW, Australia
[12] Univ Newcastle, Sydney, NSW, Australia
[13] Northern Sydney Local Hlth Dist, Dept Obstet & Gynaecol, Sydney, NSW, Australia
[14] Concord Hosp, Dept Cardiol, Sydney, NSW, Australia
[15] Univ Sydney, Sch Comp Sci J12, Off 543,Level 5, Sydney, NSW 2006, Australia
关键词
Electronic medical record; Cohort identification; Electronic phenotype; Acute coronary syndrome; ELECTRONIC HEALTH RECORDS; IDENTIFICATION; INFORMATION; VALIDATION; CHALLENGES; STRATEGIES;
D O I
10.1186/s12911-021-01441-w
中图分类号
R-058 [];
学科分类号
摘要
Background There have been few studies describing how production EMR systems can be systematically queried to identify clinically-defined populations and limited studies utilising free-text in this process. The aim of this study is to provide a generalisable methodology for constructing clinically-defined EMR-derived patient cohorts using structured and unstructured data in EMRs. Methods Patients with possible acute coronary syndrome (ACS) were used as an exemplar. Cardiologists defined clinical criteria for patients presenting with possible ACS. These were mapped to data tables within the production EMR system creating seven inclusion criteria comprised of structured data fields (orders and investigations, procedures, scanned electrocardiogram (ECG) images, and diagnostic codes) and unstructured clinical documentation. Data were extracted from two local health districts (LHD) in Sydney, Australia. Outcome measures included examination of the relative contribution of individual inclusion criteria to the identification of eligible encounters, comparisons between inclusion criterion and evaluation of consistency of data extracts across years and LHDs. Results Among 802,742 encounters in a 5 year dataset (1/1/13-30/12/17), the presence of an ECG image (54.8% of encounters) and symptoms and keywords in clinical documentation (41.4-64.0%) were used most often to identify presentations of possible ACS. Orders and investigations (27.3%) and procedures (1.4%), were less often present for identified presentations. Relevant ICD-10/SNOMED CT codes were present for 3.7% of identified encounters. Similar trends were seen when the two LHDs were examined separately, and across years. Conclusions Clinically-defined EMR-derived cohorts combining structured and unstructured data during cohort identification is a necessary prerequisite for critical validation work required for development of real-time clinical decision support and learning health systems.
引用
收藏
页数:10
相关论文
共 39 条
  • [1] Combining structured and unstructured data to identify a cohort of ICU patients who received dialysis
    Abhyankar, Swapna
    Demner-Fushman, Dina
    Callaghan, Fiona M.
    McDonald, Clement J.
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2014, 21 (05) : 801 - 807
  • [2] Defining asthma and assessing asthma outcomes using electronic health record data: a systematic scoping review
    Al Sallakh, Mohammad A.
    Vasileiou, Eleftheria
    Rodgers, Sarah E.
    Lyons, Ronan A.
    Sheikh, Aziz
    Davies, Gwyneth A.
    [J]. EUROPEAN RESPIRATORY JOURNAL, 2017, 49 (06)
  • [3] The REporting of studies Conducted using Observational Routinely-collected health Data (RECORD) Statement
    Benchimol, Eric I.
    Smeeth, Liam
    Guttmann, Astrid
    Harron, Katie
    Moher, David
    Petersen, Irene
    Sorensen, Henrik T.
    von Elm, Erik
    Langan, Sinead M.
    [J]. PLOS MEDICINE, 2015, 12 (10)
  • [4] Botsis Taxiarchis, 2010, Summit Transl Bioinform, V2010, P1
  • [5] Systematic review of discharge coding accuracy
    Burns, E. M.
    Rigby, E.
    Mamidanna, R.
    Bottle, A.
    Aylin, P.
    Ziprin, P.
    Faiz, O. D.
    [J]. JOURNAL OF PUBLIC HEALTH, 2012, 34 (01) : 138 - 148
  • [6] Portability of an algorithm to identify rheumatoid arthritis in electronic health records
    Carroll, Robert J.
    Thompson, Will K.
    Eyler, Anne E.
    Mandelin, Arthur M.
    Cai, Tianxi
    Zink, Raquel M.
    Pacheco, Jennifer A.
    Boomershine, Chad S.
    Lasko, Thomas A.
    Xu, Hua
    Karlson, Elizabeth W.
    Perez, Raul G.
    Gainer, Vivian S.
    Murphy, Shawn N.
    Ruderman, Eric M.
    Pope, Richard M.
    Plenge, Robert M.
    Kho, Abel Ngo
    Liao, Katherine P.
    Denny, Joshua C.
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2012, 19 (E1) : E162 - E169
  • [7] Using Electronic Health Records for Population Health Research: A Review of Methods and Applications
    Casey, Joan A.
    Schwartz, Brian S.
    Stewart, Walter F.
    Adler, Nancy E.
    [J]. ANNUAL REVIEW OF PUBLIC HEALTH, VOL 37, 2016, 37 : 61 - 81
  • [8] Identification of urinary tract infections using electronic health record data
    Colborn, Kathryn L.
    Bronsert, Michael
    Hammermeister, Karl
    Henderson, William G.
    Singh, Abhinav B.
    Meguid, Robert A.
    [J]. AMERICAN JOURNAL OF INFECTION CONTROL, 2019, 47 (04) : 371 - 375
  • [9] UpSetR: an R package for the visualization of intersecting sets and their properties
    Conway, Jake R.
    Lex, Alexander
    Gehlenborg, Nils
    [J]. BIOINFORMATICS, 2017, 33 (18) : 2938 - 2940
  • [10] Identifying priorities in methodological research using ICD-9-ICM and ICD-10 administrative data: report from an international consortium
    De Coster, Carolyn
    Quan, Hude
    Finlayson, Alan
    Gao, Min
    Halfon, Patricia
    Humphries, Karin H.
    Johansen, Helen
    Lix, Lisa M.
    Luthi, Jean-Christophe
    Ma, Jin
    Romano, Patrick S.
    Roos, Leslie
    Sundararajan, Vijaya
    Tu, Jack V.
    Webster, Greg
    Ghali, William A.
    [J]. BMC HEALTH SERVICES RESEARCH, 2006, 6 (1)