Multivariate longitudinal data for survival analysis of cardiovascular event prediction in young adults: insights from a comparative explainable study

被引:5
|
作者
Nguyen, Hieu T. [1 ]
Vasconcellos, Henrique D. [2 ]
Keck, Kimberley [2 ]
Reis, Jared P. [3 ]
Lewis, Cora E. [4 ]
Sidney, Steven [5 ]
Lloyd-Jones, Donald M. [6 ]
Schreiner, Pamela J. [7 ]
Guallar, Eliseo [8 ]
Wu, Colin O. [3 ]
Lima, Joao A. C. [2 ,9 ]
Ambale-Venkatesh, Bharath [9 ]
机构
[1] Johns Hopkins Univ, Dept Biomed Engn, Baltimore, MD 21218 USA
[2] Johns Hopkins Univ, Dept Cardiol, Baltimore, MD USA
[3] NHLBI, Bldg 10, Bethesda, MD 20892 USA
[4] Univ Alabama Birmingham, Sch Publ Hlth, Dept Epidemiol, Birmingham, AL 35294 USA
[5] Kaiser Permanente, Div Res, Oakland, CA USA
[6] Northwestern Univ, Dept Prevent Med, Chicago, IL 60611 USA
[7] Univ Minnesota, Sch Publ Hlth, Minneapolis, MN USA
[8] Johns Hopkins Univ, Sch Publ Hlth, Dept Epidemiol, Baltimore, MD 21205 USA
[9] Johns Hopkins Univ, Dept Radiol, Baltimore, MD 21218 USA
关键词
Longitudinal data; Explainable AI; Survival analysis; Risk prediction; Repeated measures; Personalized medicine; Time-varying covariates; SHAP; TIME; CARDIA; TIME-TO-EVENT; ELECTRONIC HEALTH RECORDS; RISK PREDICTION; BLOOD-PRESSURE; HEART-FAILURE; JOINT MODEL; MIDDLE-AGE; DISEASE; TRAJECTORIES; PACKAGE;
D O I
10.1186/s12874-023-01845-4
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
BackgroundMultivariate longitudinal data are under-utilized for survival analysis compared to cross-sectional data (CS - data collected once across cohort). Particularly in cardiovascular risk prediction, despite available methods of longitudinal data analysis, the value of longitudinal information has not been established in terms of improved predictive accuracy and clinical applicability.MethodsWe investigated the value of longitudinal data over and above the use of cross-sectional data via 6 distinct modeling strategies from statistics, machine learning, and deep learning that incorporate repeated measures for survival analysis of the time-to-cardiovascular event in the Coronary Artery Risk Development in Young Adults (CARDIA) cohort. We then examined and compared the use of model-specific interpretability methods (Random Survival Forest Variable Importance) and model-agnostic methods (SHapley Additive exPlanation (SHAP) and Temporal Importance Model Explanation (TIME)) in cardiovascular risk prediction using the top-performing models.ResultsIn a cohort of 3539 participants, longitudinal information from 35 variables that were repeatedly collected in 6 exam visits over 15 years improved subsequent long-term (17 years after) risk prediction by up to 8.3% in C-index compared to using baseline data (0.78 vs. 0.72), and up to approximately 4% compared to using the last observed CS data (0.75). Time-varying AUC was also higher in models using longitudinal data (0.86-0.87 at 5 years, 0.79-0.81 at 10 years) than using baseline or last observed CS data (0.80-0.86 at 5 years, 0.73-0.77 at 10 years). Comparative model interpretability analysis revealed the impact of longitudinal variables on model prediction on both the individual and global scales among different modeling strategies, as well as identifying the best time windows and best timing within that window for event prediction. The best strategy to incorporate longitudinal data for accuracy was time series massive feature extraction, and the easiest interpretable strategy was trajectory clustering.ConclusionOur analysis demonstrates the added value of longitudinal data in predictive accuracy and epidemiological utility in cardiovascular risk survival analysis in young adults via a unified, scalable framework that compares model performance and explainability. The framework can be extended to a larger number of variables and other longitudinal modeling methods.
引用
收藏
页数:19
相关论文
共 24 条
  • [1] Multivariate longitudinal data for survival analysis of cardiovascular event prediction in young adults: insights from a comparative explainable study
    Hieu T. Nguyen
    Henrique D. Vasconcellos
    Kimberley Keck
    Jared P. Reis
    Cora E. Lewis
    Steven Sidney
    Donald M. Lloyd-Jones
    Pamela J. Schreiner
    Eliseo Guallar
    Colin O. Wu
    João A.C. Lima
    Bharath Ambale-Venkatesh
    BMC Medical Research Methodology, 23
  • [2] Learning from Longitudinal Data in Electronic Health Record and Genetic Data to Improve Cardiovascular Event Prediction
    Zhao, Juan
    Feng, QiPing
    Wu, Patrick
    Lupu, Roxana A.
    Wilke, Russell A.
    Wells, Quinn S.
    Denny, Joshua C.
    Wei, Wei-Qi
    SCIENTIFIC REPORTS, 2019, 9 (1)
  • [3] Prediction of transplant-free survival in idiopathic pulmonary fibrosis patients using joint models for event times and mixed multivariate longitudinal data
    Choi, Jiin
    Anderson, Stewart J.
    Richards, Thomas J.
    Thompson, Wesley K.
    JOURNAL OF APPLIED STATISTICS, 2014, 41 (10) : 2192 - 2205
  • [4] Clinical risk prediction with random forests for survival, longitudinal, and multivariate (RF-SLAM) data analysis
    Shannon Wongvibulsin
    Katherine C. Wu
    Scott L. Zeger
    BMC Medical Research Methodology, 20
  • [5] Clinical risk prediction with random forests for survival, longitudinal, and multivariate (RF-SLAM) data analysis
    Wongvibulsin, Shannon
    Wu, Katherine C.
    Zeger, Scott L.
    BMC MEDICAL RESEARCH METHODOLOGY, 2019, 20 (01)
  • [6] Hs-cTroponins for the prediction of recurrent cardiovascular events in patients with established CHD - A comparative analysis from the KAROLA study
    Jansen, Henning
    Jaensch, Andrea
    Breitling, Lutz P.
    Hoppe, Liesa
    Dallmeier, Dhayana
    Schmucker, Roman
    Brenner, Hermann
    Koenig, Wolfgang
    Rothenbacher, Dietrich
    INTERNATIONAL JOURNAL OF CARDIOLOGY, 2018, 250 : 247 - 252
  • [7] Comparative study of joint analysis of microarray gene expression data in survival prediction and risk assessment of breast cancer patients
    Yasrebi, Haleh
    BRIEFINGS IN BIOINFORMATICS, 2016, 17 (05) : 771 - 785
  • [8] The effect of obesity phenotype changes on cardiovascular outcomes in adults older than 40 years in the prospective cohort of the Tehran lipids and glucose study (TLGS): joint model of longitudinal and time-to-event data
    Sedaghat, Zahra
    Khodakarim, Soheila
    Sabour, Siamak
    Valizadeh, Majid
    Barzin, Maryam
    Nejadghaderi, Seyed Aria
    Azizi, Fereidoun
    BMC PUBLIC HEALTH, 2024, 24 (01)
  • [9] Global Analysis of Nutritional Factors and Cardiovascular Risk: Insights from Worldwide Data and a Case Study in Mexican Children
    Sanchez-Meza, Karmina
    Hernandez-Fuentes, Gustavo A.
    Sanchez-Meza, Estibaliz
    Delgado-Enciso, Ivan
    Sanchez-Ramirez, Carmen A.
    Muniz-Valencia, Roberto
    Guzman-Esquivel, Jose
    Garza-Veloz, Idalia
    Martinez-Fierro, Margarita L.
    Rodriguez-Sanchez, Iram P.
    Diaz-Martinez, Janet
    Cerna-Cortes, Joel
    Beas-Guzman, Oscar F.
    Ramirez-Flores, Mario
    JOURNAL OF CARDIOVASCULAR DEVELOPMENT AND DISEASE, 2025, 12 (04)
  • [10] A machine learning analysis of suicidal ideation and suicide attempt among US youth and young adults from multilevel, longitudinal survey data
    Jacobs, Molly M.
    Kirby, Anne V.
    Kramer, Jessica M.
    Marlow, Nicole M.
    FRONTIERS IN PSYCHIATRY, 2025, 16