PheW2P2V: a phenome-wide prediction framework with weighted patient representations using electronic health records

被引:0
|
作者
Guo, Jia [1 ]
Kiryluk, Krzysztof [2 ]
Wang, Shuang [1 ]
机构
[1] Columbia Univ, Dept Biostat, 722 West 168th St,6th Floor, New York, NY 10032 USA
[2] Columbia Univ, Dept Med, New York, NY 10032 USA
关键词
phenome-wide prediction; patient representations; electronic health records (EHRs); ASSOCIATION; POPULATION;
D O I
10.1093/jamiaopen/ooae084
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Objective: Electronic health records (EHRs) provide opportunities for the development of computable predictive tools. Conventional machine learning methods and deep learning methods have been widely used for this task, with the approach of usually designing one tool for one clinical outcome. Here we developed PheW(2)P2V, a Phenome-Wide prediction framework using Weighted Patient Vectors. PheW(2)P2V conducts tailored predictions for phenome-wide phenotypes using numeric representations of patients' past medical records weighted based on their similarities with individual phenotypes. Materials and Methods: PheW(2)P2V defines clinical disease phenotypes using Phecode mapping based on International Classification of Disease codes, which reduces redundancy and case-control misclassification in real-life EHR datasets. Through upweighting medical records of patients that are more relevant to a phenotype of interest in calculating patient vectors, PheW(2)P2V achieves tailored incidence risk prediction of a phenotype. The calculation of weighted patient vectors is computationally efficient, and the weighting mechanism ensures tailored predictions across the phenome. We evaluated prediction performance of PheW(2)P2V and baseline methods with simulation studies and clinical applications using the MIMIC-III database. Results: Across 942 phenome-wide predictions using the MIMIC-III database, PheW2P2V has median area under the receiver operating characteristic curve (AUC-ROC) 0.74 (baseline methods have values <= 0.72), median max F-1-score 0.20 (baseline methods have values <= 0.19), and median area under the precision-recall curve (AUC-PR) 0.10 (baseline methods have values <= 0.10). Discussion: PheW(2)P2V can predict phenotypes efficiently by using medical concept embeddings and upweighting relevant past medical histories. By leveraging both labeled and unlabeled data, PheW2P2V reduces overfitting and improves predictions for rare phenotypes, making it a useful screening tool for early diagnosis of high-risk conditions, though further research is needed to assess the transferability of embeddings across different databases. Conclusions: PheW(2)P2V is fast, flexible, and has superior prediction performance for many clinical disease phenotypes across the phenome of the MIMIC-III database compared to that of several popular baseline methods.
引用
收藏
页数:9
相关论文
共 3 条
  • [1] Quantifying the phenome-wide disease burden of obesity using electronic health records and genomics
    Robinson, Jamie R.
    Carroll, Robert J.
    Bastarache, Lisa
    Chen, Qingxia
    Pirruccello, James
    Mou, Zongyang
    Wei, Wei-Qi
    Connolly, John
    Mentch, Frank
    Crane, Paul K.
    Hebbring, Scott J.
    Crosslin, David R.
    Gordon, Adam S.
    Rosenthal, Elisabeth A.
    Stanaway, Ian B.
    Hayes, M. Geoffrey
    Wei, Wei
    Petukhova, Lynn
    Namjou-Khales, Bahram
    Zhang, Ge
    Safarova, Mayya S.
    Walton, Nephi A.
    Still, Christopher
    Bottinger, Erwin P.
    Loos, Ruth J. F.
    Murphy, Shawn N.
    Jackson, Gretchen P.
    Abumrad, Naji
    Kullo, Iftikhar J.
    Jarvik, Gail P.
    Larson, Eric B.
    Weng, Chunhua
    Roden, Dan
    Khera, Amit V.
    Denny, Joshua C.
    OBESITY, 2022, 30 (12) : 2477 - 2488
  • [2] Predicting the onset of type 2 diabetes using wide and deep learning with electronic health records
    Nguyen, Binh P.
    Pham, Hung N.
    Tran, Hop
    Nghiem, Nhung
    Nguyen, Quang H.
    Do, Trang T. T.
    Cao Truong Tran
    Simpson, Colin R.
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2019, 182
  • [3] DeepBiomarker2: Prediction of Alcohol and Substance Use Disorder Risk in Post-Traumatic Stress Disorder Patients Using Electronic Medical Records and Multiple Social Determinants of Health
    Miranda, Oshin
    Fan, Peihao
    Qi, Xiguang
    Wang, Haohan
    Brannock, M. Daniel
    Kosten, Thomas R.
    Ryan, Neal David
    Kirisci, Levent
    Wang, Lirong
    JOURNAL OF PERSONALIZED MEDICINE, 2024, 14 (01):