Machine learning approaches for electronic health records phenotyping: a methodical review

被引：42

作者：

Yang, Siyue ^{[1
]}

Varghese, Paul ^{[2
]}

Stephenson, Ellen ^{[3
]}

Tu, Karen ^{[3
]}

Gronsbell, Jessica ^{[1
,3
,4
,5
]}

机构：

[1] Univ Toronto, Dept Stat Sci, Toronto, ON, Canada

[2] Verily Life Sci, Cambridge, MA USA

[3] Univ Toronto, Dept Family & Community Med, Toronto, ON, Canada

[4] Univ Toronto, Dept Comp Sci, Toronto, ON, Canada

[5] Univ Toronto, Dept Stat Sci, 700 Univ Ave, Toronto, ON M5G 1Z5, Canada

来源：

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION | 2023年 / 30卷 / 02期

基金：

加拿大自然科学与工程研究理事会;

关键词：

electronic health records; phenotyping; cohort identification; machine learning; CLINICAL-TRIALS; INFORMATION; VALIDATION; ALGORITHMS; EXTRACTION; SELECTION; MODEL; TEXT;

D O I：

10.1093/jamia/ocac216

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Objective Accurate and rapid phenotyping is a prerequisite to leveraging electronic health records for biomedical research. While early phenotyping relied on rule-based algorithms curated by experts, machine learning (ML) approaches have emerged as an alternative to improve scalability across phenotypes and healthcare settings. This study evaluates ML-based phenotyping with respect to (1) the data sources used, (2) the phenotypes considered, (3) the methods applied, and (4) the reporting and evaluation methods used. Materials and methods We searched PubMed and Web of Science for articles published between 2018 and 2022. After screening 850 articles, we recorded 37 variables on 100 studies. Results Most studies utilized data from a single institution and included information in clinical notes. Although chronic conditions were most commonly considered, ML also enabled the characterization of nuanced phenotypes such as social determinants of health. Supervised deep learning was the most popular ML paradigm, while semi-supervised and weakly supervised learning were applied to expedite algorithm development and unsupervised learning to facilitate phenotype discovery. ML approaches did not uniformly outperform rule-based algorithms, but deep learning offered a marginal improvement over traditional ML for many conditions. Discussion Despite the progress in ML-based phenotyping, most articles focused on binary phenotypes and few articles evaluated external validity or used multi-institution data. Study settings were infrequently reported and analytic code was rarely released. Conclusion Continued research in ML-based phenotyping is warranted, with emphasis on characterizing nuanced phenotypes, establishing reporting and evaluation standards, and developing methods to accommodate misclassified phenotypes due to algorithm errors in downstream applications.

引用

页码：367 / 381

页数：15

共 163 条

[1] Subtypes in patients with opioid misuse: A prognostic enrichment strategy using electronic health record data in hospitalized patients
Afshar, Majid
Joyce, Cara
Dligach, Dmitriy
Sharma, Brihat
Kania, Robert
Xie, Meng
Swope, Kristin
Salisbury-Afshar, Elizabeth
Karnik, Niranjan S.
[J]. PLOS ONE, 2019, 14 (07):
[2] Afshar Majid, 2018, AMIA Annu Symp Proc, V2018, P157
[3] Natural language processing and machine learning to identify alcohol misuse from the electronic health record in trauma patients: development and internal validation
Afshar, Majid
Phillips, Andrew
Karnik, Niranjan
Mueller, Jeanne
To, Daniel
Gonzalez, Richard
Price, Ron
Cooper, Richard
Joyce, Cara
Dligach, Dmitriy
[J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2019, 26 (03) : 254 - 261
[4] Learning statistical models of phenotypes using noisy labeled training data
Agarwal, Vibhu
Podchiyska, Tanya
Banda, Juan M.
Goel, Veena
Leung, Tiffany I.
Minty, Evan P.
Sweeney, Timothy E.
Gyang, Elsie
Shah, Nigam H.
[J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2016, 23 (06) : 1166 - 1173
[5] Ahuja Y., 2021, RES SQUARE
[6] sureLDA: A multidisease automated phenotyping method for the electronic health record
Ahuja, Yuri
Zhou, Doudou
He, Zeling
Sun, Jiehuan
Castro, Victor M.
Gainer, Vivian
Murphy, Shawn N.
Hong, Chuan
Cai, Tianxi
[J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2020, 27 (08) : 1235 - 1243
[7] Alsentzer Emily, 2019, ARXIV
[8] A Review of Automatic Phenotyping Approaches using Electronic Health Records
Alzoubi, Hadeel
Alzubi, Raid
Ramzan, Naeem
West, Daune
Al-Hadhrami, Tawfik
Alazab, Mamoun
[J]. ELECTRONICS, 2019, 8 (11)
[9] A natural language processing and deep learning approach to identify child abuse from pediatric electronic medical records
Annapragada, Akshaya, V
Donaruma-Kwoh, Marcella M.
Annapragada, Ananth, V
Starosolski, Zbigniew A.
[J]. PLOS ONE, 2021, 16 (02):
[10] Apostolova Emilia, 2019, AMIA Annu Symp Proc, V2019, P228

← 1 2 3 4 5 6 7 8 9 10 →