Multi-modality risk prediction of cardiovascular diseases for breast cancer cohort in the All of Us Research Program

被引:1
|
作者
Yang, Han [1 ]
Zhou, Sicheng [1 ]
Rao, Zexi [2 ]
Zhao, Chen [2 ]
Cui, Erjia [2 ]
Shenoy, Chetan [3 ]
Blaes, Anne H. [4 ]
Paidimukkala, Nishitha [1 ]
Wang, Jinhua [5 ]
Hou, Jue [2 ]
Zhang, Rui [6 ]
机构
[1] Univ Minnesota, Inst Hlth Informat, Minneapolis, MN 55455 USA
[2] Univ Minnesota, Sch Publ Hlth, Div Biostat & Hlth Data Sci, 2221 Univ Ave SE,Suite 200, Minneapolis, MN 55414 USA
[3] Univ Minnesota, Med Ctr, Dept Med, Cardiovasc Div, Minneapolis, MN 55455 USA
[4] Univ Minnesota, Div Hematol Oncol & Transplantat, Minneapolis, MN 55455 USA
[5] Univ Minnesota, Masonic Canc Ctr, Minneapolis, MN 55455 USA
[6] Univ Minnesota, Dept Surg, Div Comp Hlth Sci, 308 Harvard St SE, Minneapolis, MN 55455 USA
基金
美国国家卫生研究院;
关键词
cardiovascular disease; breast cancer; predictive model; All of Us; SOCIAL DETERMINANTS; SURVIVAL; MODELS; TIME; ASSOCIATIONS; STATEMENT; SELECTION; IMPACT; INDEX; LASSO;
D O I
10.1093/jamia/ocae199
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective This study leverages the rich diversity of the All of Us Research Program (All of Us)'s dataset to devise a predictive model for cardiovascular disease (CVD) in breast cancer (BC) survivors. Central to this endeavor is the creation of a robust data integration pipeline that synthesizes electronic health records (EHRs), patient surveys, and genomic data, while upholding fairness across demographic variables.Materials and Methods We have developed a universal data wrangling pipeline to process and merge heterogeneous data sources of the All of Us dataset, address missingness and variance in data, and align disparate data modalities into a coherent framework for analysis. Utilizing a composite feature set including EHR, lifestyle, and social determinants of health (SDoH) data, we then employed Adaptive Lasso and Random Forest regression models to predict 6 CVD outcomes. The models were evaluated using the c-index and time-dependent Area Under the Receiver Operating Characteristic Curve over a 10-year period.Results The Adaptive Lasso model showed consistent performance across most CVD outcomes, while the Random Forest model excelled particularly in predicting outcomes like transient ischemic attack when incorporating the full multi-model feature set. Feature importance analysis revealed age and previous coronary events as dominant predictors across CVD outcomes, with SDoH clustering labels highlighting the nuanced impact of social factors.Discussion The development of both Cox-based predictive model and Random Forest Regression model represents the extensive application of the All of Us, in integrating EHR and patient surveys to enhance precision medicine. And the inclusion of SDoH clustering labels revealed the significant impact of sociobehavioral factors on patient outcomes, emphasizing the importance of comprehensive health determinants in predictive models. Despite these advancements, limitations include the exclusion of genetic data, broad categorization of CVD conditions, and the need for fairness analyses to ensure equitable model performance across diverse populations. Future work should refine clinical and social variable measurements, incorporate advanced imputation techniques, and explore additional predictive algorithms to enhance model precision and fairness.Conclusion This study demonstrates the liability of the All of Us's diverse dataset in developing a multi-modality predictive model for CVD in BC survivors risk stratification in oncological survivorship. The data integration pipeline and subsequent predictive models establish a methodological foundation for future research into personalized healthcare.
引用
收藏
页码:2800 / 2810
页数:11
相关论文
共 50 条
  • [31] Multi-modality data fusion aids early detection of breast cancer using conventional technology and advanced digital infrared imaging
    Arena, F
    DiCicco, T
    Anand, A
    PROCEEDINGS OF THE 26TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-7, 2004, 26 : 1170 - 1173
  • [32] Breast cancer risk predictions by birth cohort and ethnicity in a population-based screening mammography program
    Epp, Joyce
    Rajapakshe, Rasika
    BRITISH JOURNAL OF RADIOLOGY, 2022, 95 (1136):
  • [33] Association of Longitudinal Activity Measures and Diabetes Risk: An Analysis From the National Institutes of Health All of Us Research Program
    Perry, Andrew S.
    Annis, Jeffrey S.
    Master, Hiral
    Nayor, Matthew
    Hughes, Andrew
    Kouame, Aymone
    Natarajan, Karthik
    Marginean, Kayla
    Murthy, Venkatesh
    Roden, Dan M.
    Harris, Paul A.
    Shah, Ravi
    Brittain, Evan L.
    JOURNAL OF CLINICAL ENDOCRINOLOGY & METABOLISM, 2023, 108 (05): : 1101 - 1109
  • [34] All-cause mortality, cardiovascular mortality, and incidence of cardiovascular disease according to a screening program of cardiovascular risk in South Korea among young adults: a nationwide cohort study
    Yun, J. M.
    Choi, S.
    Kim, K.
    Kim, S. M.
    Son, J. S.
    Lee, G.
    Jeong, S-M
    Park, S. Y.
    Kim, Y-Y
    Park, S. M.
    PUBLIC HEALTH, 2021, 190 : 23 - 29
  • [35] Addition of polygenic risk score to a risk calculator for prediction of breast cancer in US Black women
    Gary R. Zirpoli
    Ruth M. Pfeiffer
    Kimberly A. Bertrand
    Dezheng Huo
    Kathryn L. Lunetta
    Julie R. Palmer
    Breast Cancer Research, 26
  • [36] Addition of polygenic risk score to a risk calculator for prediction of breast cancer in US Black women
    Zirpoli, Gary R.
    Pfeiffer, Ruth M.
    Bertrand, Kimberly A.
    Huo, Dezheng
    Lunetta, Kathryn L.
    Palmer, Julie R.
    BREAST CANCER RESEARCH, 2024, 26 (01)
  • [37] Lifestyle Factors, Genetic Risk, and Cardiovascular Disease Risk among Breast Cancer Survivors: A Prospective Cohort Study in UK Biobank
    Peng, Hexiang
    Wang, Siyue
    Wang, Mengying
    Wang, Xueheng
    Guo, Huangda
    Huang, Jie
    Wu, Tao
    NUTRIENTS, 2023, 15 (04)
  • [38] Cohort Profile: The Karolinska Mammography Project for Risk Prediction of Breast Cancer (KARMA)
    Gabrielson, Marike
    Eriksson, Mikael
    Hammarstrom, Mattias
    Borgquist, Signe
    Leifland, Karin
    Czene, Kamila
    Hall, Per
    INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2017, 46 (06) : 1740 - +
  • [39] Computational phenotyping with the All of Us Research Program: identifying underrepresented people with HIV or at risk of HIV
    Yang, Xueying
    Zhang, Jiajia
    Cai, Ruilie
    Liang, Chen
    Olatosi, Bankole
    Weissman, Sharon
    Li, Xiaoming
    JAMIA OPEN, 2023, 6 (03)
  • [40] Impact of cumulative body mass index and cardiometabolic diseases on survival among patients with colorectal and breast cancer: a multi-centre cohort study
    Mirjam Kohls
    Heinz Freisling
    Hadrien Charvat
    Isabelle Soerjomataram
    Vivian Viallon
    Veronica Davila-Batista
    Rudolf Kaaks
    Renée Turzanski-Fortner
    Krasimira Aleksandrova
    Matthias B. Schulze
    Christina C. Dahm
    Helene Tilma Vistisen
    Agnetha Linn Rostgaard-Hansen
    Anne Tjønneland
    Catalina Bonet
    Maria-Jose Sánchez
    Sandra Colorado-Yohar
    Giovanna Masala
    Domenico Palli
    Vittorio Krogh
    Fulvio Ricceri
    Olov Rolandsson
    Sai San Moon Lu
    Konstantinos K. Tsilidis
    Elisabete Weiderpass
    Marc J. Gunter
    Pietro Ferrari
    Ursula Berger
    Melina Arnold
    BMC Cancer, 22