A machine learning approach to leveraging electronic health records for enhanced omics analysis

被引:7
作者
Mataraso, Samson J. [1 ,2 ,3 ]
Espinosa, Camilo A. [1 ,2 ,3 ,4 ]
Seong, David [1 ,5 ]
Reincke, S. Momsen [1 ,2 ,3 ]
Berson, Eloise [1 ,3 ,6 ]
Reiss, Jonathan D. [2 ]
Kim, Yeasul [1 ,2 ,3 ]
Ghanem, Marc [1 ]
Shu, Chi-Hung [1 ]
James, Tomin [1 ]
Tan, Yuqi [6 ,7 ]
Shome, Sayane [1 ,2 ]
Stelzer, Ina A. [1 ,8 ]
Feyaerts, Dorien [1 ]
Wong, Ronald J. [2 ]
Shaw, Gary M. [2 ]
Angst, Martin S. [1 ]
Gaudilliere, Brice [1 ]
Stevenson, David K. [2 ]
Aghaeepour, Nima [1 ,2 ,3 ]
机构
[1] Stanford Univ, Sch Med, Dept Anesthesiol Perioperat & Pain Med, Stanford, CA 94305 USA
[2] Stanford Univ, Sch Med, Dept Pediat, Stanford, CA 94305 USA
[3] Stanford Univ, Sch Med, Dept Biomed Data Sci, Stanford, CA 94305 USA
[4] Stanford Univ, Sch Med, Immunol Program, Stanford, CA USA
[5] Stanford Univ, Sch Med, Med Scientist Training Program, Stanford, CA USA
[6] Stanford Univ, Sch Med, Dept Pathol, Stanford, CA USA
[7] Stanford Univ, Sch Med, Dept Microbiol & Immunol, Stanford, CA USA
[8] Univ Calif San Diego, Dept Pathol, La Jolla, CA USA
基金
美国国家卫生研究院;
关键词
CYSTATIN-C; PREECLAMPSIA; CELLS;
D O I
10.1038/s42256-024-00974-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Omics studies produce a large number of measurements, enabling the development, validation and interpretation of systems-level biological models. Large cohorts are required to power these complex models; yet, the cohort size remains limited due to clinical and budgetary constraints. We introduce clinical and omics multimodal analysis enhanced with transfer learning (COMET), a machine learning framework that incorporates large, observational electronic health record databases and transfer learning to improve the analysis of small datasets from omics studies. By pretraining on electronic health record data and adaptively blending both early and late fusion strategies, COMET overcomes the limitations of existing multimodal machine learning methods. Using two independent datasets, we showed that COMET improved the predictive modelling performance and biological discovery compared with the analysis of omics data with traditional methods. By incorporating electronic health record data into omics analyses, COMET enables more precise patient classifications, beyond the simplistic binary reduction to cases and controls. This framework can be broadly applied to the analysis of multimodal omics studies and reveals more powerful biological insights from limited cohort sizes.
引用
收藏
页码:293 / 306
页数:17
相关论文
共 43 条
[1]   Serum cystatin-c as predictive factor of preeclampsia: A meta-analysis of 27 observational studies [J].
Bellos, Ioannis ;
Fitrou, Georgia ;
Daskalakis, Georgios ;
Papantoniou, Nikolaos ;
Pergialiotis, Vasilios .
PREGNANCY HYPERTENSION-AN INTERNATIONAL JOURNAL OF WOMENS CARDIOVASCULAR HEALTH, 2019, 16 :97-104
[2]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[3]   scGPT: toward building a foundation model for single-cell multi-omics using generative AI [J].
Cui, Haotian ;
Wang, Chloe ;
Maan, Hassaan ;
Pang, Kuan ;
Luo, Fengning ;
Duan, Nan ;
Wang, Bo .
NATURE METHODS, 2024, 21 (08) :1470-1480
[4]   Integration of mechanistic immunological knowledge into a machine learning pipeline improves predictions [J].
Culos, Anthony ;
Tsai, Amy S. ;
Stanley, Natalie ;
Becker, Martin ;
Ghaemi, Mohammad S. ;
McIlwain, David R. ;
Fallahzadeh, Ramin ;
Tanada, Athena ;
Nassar, Huda ;
Espinosa, Camilo ;
Xenochristou, Maria ;
Ganio, Edward ;
Peterson, Laura ;
Han, Xiaoyuan ;
Stelzer, Ina A. ;
Ando, Kazuo ;
Gaudilliere, Dyani ;
Phongpreecha, Thanaphong ;
Maric, Ivana ;
Chang, Alan L. ;
Shaw, Gary M. ;
Stevenson, David K. ;
Bendall, Sean ;
Davis, Kara L. ;
Fantl, Wendy ;
Nolan, Garry P. ;
Hastie, Trevor ;
Tibshirani, Robert ;
Angst, Martin S. ;
Gaudilliere, Brice ;
Aghaeepour, Nima .
NATURE MACHINE INTELLIGENCE, 2020, 2 (10) :619-628
[5]   Are Meaningful Use Stage 2 certified EHRs ready for interoperability? Findings from the SMART C-CDA Collaborative [J].
D'Amore, John D. ;
Mandel, Joshua C. ;
Kreda, David A. ;
Swain, Ashley ;
Koromia, George A. ;
Sundareswaran, Sumesh ;
Alschuler, Liora ;
Dolin, Robert H. ;
Mandl, Kenneth D. ;
Kohane, Isaac S. ;
Ramoni, Rachel B. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2014, 21 (06) :1060-1068
[6]  
Datta S, 2020, Arxiv, DOI [arXiv:2003.10534, DOI 10.48550/ARXIV.2003.10534]
[7]   Data-driven longitudinal characterization of neonatal health and morbidity [J].
De Francesco, Davide ;
Reiss, Jonathan D. ;
Roger, Jacquelyn ;
Tang, Alice S. ;
Chang, Alan L. ;
Becker, Martin ;
Phongpreecha, Thanaphong ;
Espinosa, Camilo ;
Morin, Susanna ;
Berson, Eloise ;
Thuraiappah, Melan ;
Le, Brian L. ;
Ravindra, Neal G. ;
Payrovnaziri, Seyedeh Neelufar ;
Mataraso, Samson ;
Kim, Yeasul ;
Xue, Lei ;
Rosenstein, Melissa G. ;
Oskotsky, Tomiko ;
Maric, Ivana ;
Gaudilliere, Brice ;
Carvalho, Brendan ;
Bateman, Brian T. ;
Angst, Martin S. ;
Prince, Lawrence S. ;
Blumenfeld, Yair J. ;
Benitz, William E. ;
Fuerch, Janene H. ;
Shaw, Gary M. ;
Sylvester, Karl G. ;
Stevenson, David K. ;
Sirota, Marina ;
Aghaeepour, Nima .
SCIENCE TRANSLATIONAL MEDICINE, 2023, 15 (683)
[8]   Cooperative learning for multiview analysis [J].
Ding, Daisy Yi ;
Li, Shuangning ;
Narasimhan, Balasubramanian ;
Tibshirani, Robert .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2022, 119 (38)
[9]  
Erhan D, 2010, J MACH LEARN RES, V11, P625
[10]   PhysioBank, PhysioToolkit, and PhysioNet - Components of a new research resource for complex physiologic signals [J].
Goldberger, AL ;
Amaral, LAN ;
Glass, L ;
Hausdorff, JM ;
Ivanov, PC ;
Mark, RG ;
Mietus, JE ;
Moody, GB ;
Peng, CK ;
Stanley, HE .
CIRCULATION, 2000, 101 (23) :E215-E220