FIBER: enabling flexible retrieval of electronic health records data for clinical predictive modeling

被引：1

作者：

Datta, Suparno ^{[1
,2
]}

Sachs, Jan Philipp ^{[1
,2
]}

Cruz, Harry FreitasDa ^{[1
,2
]}

Martensen, Tom ^{[1
]}

Bode, Philipp ^{[1
]}

Sasso, Ariane Morassi ^{[1
,2
]}

Glicksberg, Benjamin S. ^{[2
,3
]}

Boettinger, Erwin ^{[1
,2
]}

机构：

[1] Univ Potsdam, Digital Hlth Ctr, Hasso Plattner Inst, Rudolf Breitscheid Str 187, D-14482 Potsdam, Germany

[2] Icahn Sch Med Mt Sinai, Hasso Plattner Inst Digital Hlth Mt Sinai, New York, NY 10029 USA

[3] Icahn Sch Med Mt Sinai, Dept Genet & Genom Sci, New York, NY 10029 USA

来源：

JAMIA OPEN | 2021年 / 4卷 / 03期

基金：

美国国家卫生研究院; 欧盟地平线“2020”;

关键词：

databases; factual; electronic health records; information storage and retrieval; workflow; software/instrumentation; ENTERPRISE;

D O I：

10.1093/jamiaopen/ooab048

中图分类号：

R19 [保健组织与事业（卫生事业管理）];

学科分类号：

摘要：

Objectives: The development of clinical predictive models hinges upon the availability of comprehensive clinical data. Tapping into such resources requires considerable effort from clinicians, data scientists, and engineers. Specifically, these efforts are focused on data extraction and preprocessing steps required prior to modeling, including complex database queries. A handful of software libraries exist that can reduce this complexity by building upon data standards. However, a gap remains concerning electronic health records (EHRs) stored in star schema clinical data warehouses, an approach often adopted in practice. In this article, we introduce the FlexIBle EHR Retrieval (FIBER) tool: a Python library built on top of a star schema (i2b2) clinical data warehouse that enables flexible generation of modeling-ready cohorts as data frames. Materials and Methods: FIBER was developed on top of a large-scale star schema EHR database which contains data from 8 million patients and over 120 million encounters. To illustrate FIBER's capabilities, we present its application by building a heart surgery patient cohort with subsequent prediction of acute kidney injury (AKI) with various machine learning models. Results: Using FIBER, we were able to build the heart surgery cohort (n = 12 061), identify the patients that developed AKI (n = 1005), and automatically extract relevant features (n = 774). Finally, we trained machine learning models that achieved area under the curve values of up to 0.77 for this exemplary use case. Conclusion: FIBER is an open-source Python library developed for extracting information from star schema clinical data warehouses and reduces time-to-modeling, helping to streamline the clinical modeling process.

引用

页数：10

共 38 条

[1] Badger J., INSPECTOMOP
[2] Bayer M., 2012, ARCHITECTURE OPEN SO
[3] Bender D, 2013, COMP MED SY, P326, DOI 10.1109/CBMS.2013.6627810
[4] A Robust e-Epidemiology Tool in Phenotyping Heart Failure with Differentiation for Preserved and Reduced Ejection Fraction: the Electronic Medical Records and Genomics (eMERGE) Network
Bielinski, Suzette J.
Pathak, Jyotishman
Carrell, David S.
Takahashi, Paul Y.
Olson, Janet E.
Larson, Nicholas B.
Liu, Hongfang
Sohn, Sunghwan
Wells, Quinn S.
Denny, Joshua C.
Rasmussen-Torvik, Laura J.
Pacheco, Jennifer Allen
Jackson, Kathryn L.
Lesnick, Timothy G.
Gullerud, Rachel E.
Decker, Paul A.
Pereira, Naveen L.
Ryu, Euijung
Dart, Richard A.
Peissig, Peggy
Linneman, James G.
Jarvik, Gail P.
Larson, Eric B.
Bock, Jonathan A.
Tromp, Gerard C.
de Andrade, Mariza
Roger, Veronique L.
[J]. JOURNAL OF CARDIOVASCULAR TRANSLATIONAL RESEARCH, 2015, 8 (08) : 475 - 483
[5] Bisong E., 2019, Matplotlib and seaborn. Building machine learning and deep learning models on google cloud platform: A comprehensive guide for beginners, P151, DOI DOI 10.1007/978-1-4842-4470-8_12
[6] JSON']JSON: Data model, Query languages and Schema specification
Bourhis, Pierre
Reutter, Juan L.
Suarez, Fernando
Vrgoc, Domagoj
[J]. PODS'17: PROCEEDINGS OF THE 36TH ACM SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, 2017, : 123 - 135
[7] Random forests
Breiman, L
[J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
[8] XGBoost: A Scalable Tree Boosting System
Chen, Tianqi
Guestrin, Carlos
[J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794
[9] De Moor G, 2010, EHEALTH WOHIT C BARC
[10] UK phenomics platform for developing and validating electronic health record phenotypes: CALIBER
Denaxas, Spiros
Gonzalez-Izquierdo, Arturo
Direk, Kenan
Fitzpatrick, Natalie K.
Fatemifar, Ghazaleh
Banerjee, Amitava
Dobson, Richard J. B.
Howe, Laurence J.
Kuan, Valerie
Lumbers, R. Tom
Pasea, Laura
Patel, Riyaz S.
Shah, Anoop D.
Hingorani, Aroon D.
Sudlow, Cathie
Hemingway, Harry
[J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2019, 26 (12) : 1545 - 1559

← 1 2 3 4 →