Random Survival Forest in practice: a method for modelling complex metabolomics data in time to event analysis

被引:75
作者
Dietrich, Stefan [1 ]
Floegel, Anna [1 ]
Troll, Martina [2 ,3 ]
Kuehn, Tilman [4 ]
Rathmann, Wolfgang [5 ,6 ]
Peters, Anette [3 ,6 ,7 ]
Sookthai, Disorn [4 ]
von Bergen, Martin [8 ,9 ]
Kaaks, Rudolf [4 ]
Adamski, Jerzy [6 ,10 ,11 ]
Prehn, Cornelia [10 ]
Boeing, Heiner [1 ]
Schulze, Matthias B. [6 ,12 ]
Illig, Thomas [2 ,13 ,14 ]
Pischon, Tobias [1 ,15 ]
Knueppel, Sven [1 ]
Wang-Sattler, Rui [2 ,3 ,6 ]
Drogan, Dagmar [1 ]
机构
[1] German Inst Human Nutr Potsdam Rehbrucke, Dept Epidemiol, Arthur Scheunert Allee 114-116, DE-14558 Nuthetal, Germany
[2] German Res Ctr Environm Hlth, Helmholtz Zentrum Munchen, Res Unit Mol Epidemiol, Neuherberg, Germany
[3] German Res Ctr Environm Hlth, Helmholtz Zentrum Munchen, Inst Epidemiol 2, Neuherberg, Germany
[4] German Canc Res Ctr, Div Canc Epidemiol, Heidelberg, Germany
[5] Heinrich Heine Univ, Inst Biometr & Epidemiol, Leibniz Ctr Diabet Res, Dusseldorf, Germany
[6] German Ctr Diabet Res DZD, Munich, Germany
[7] Harvard Sch Publ Hlth, Dept Environm Hlth, Boston, MA USA
[8] Univ Leipzig, Dept Mol Syst Biol, Helmholtz Ctr Environm Res UFZ, Inst Biochem,Fac Biosci Pharm & Psychol, Leipzig, Germany
[9] Aalborg Univ, Dept Chem & Biosci, Aalborg, Denmark
[10] German Res Ctr Environm Hlth, Helmholtz Zentrum Munchen, Genome Anal Ctr, Inst Expt Genet, Munich, Germany
[11] Techn Univ Munchen, Lehrstuhl Expt Genet, Freising Weihenstephan, Germany
[12] German Inst Human Nutr, Dept Mol Epidemiol, Nuthetal, Germany
[13] Hannover Unified Biobank, Hannover, Germany
[14] Inst Human Genet, Hannover, Germany
[15] Max Delbruck Ctr Mol Med MDC Berlin Buch, Mol Epidemiol Grp, Berlin, Germany
关键词
Cox proportional hazards regression; exploratory survival analysis; multicollinearity; random survival forest; right-censored data; metabolomics; type 2 diabetes mellitus; variable selection; TYPE-2; DIABETES-MELLITUS; SERUM METABOLOMICS; METABOLITE PROFILES; INSULIN-RESISTANCE; VARIABLE SELECTION; PREDICTION MODELS; EPIC-GERMANY; RISK; BIOMARKERS; MARKERS;
D O I
10.1093/ije/dyw145
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Background: The application of metabolomics in prospective cohort studies is statistically challenging. Given the importance of appropriate statistical methods for selection of disease-associated metabolites in highly correlated complex data, we combined random survival forest (RSF) with an automated backward elimination procedure that addresses such issues. Methods: Our RSF approach was illustrated with data from the European Prospective Investigation into Cancer and Nutrition (EPIC)-Potsdam study, with concentrations of 127 serum metabolites as exposure variables and time to development of type 2 diabetes mellitus (T2D) as outcome variable. Out of this data set, Cox regression with a stepwise selection method was recently published. Replication of methodical comparison (RSF and Cox regression) was conducted in two independent cohorts. Finally, the R-code for implementing the metabolite selection procedure into the RSF-syntax is provided. Results: The application of the RSF approach in EPIC-Potsdam resulted in the identification of 16 incident T2D-associated metabolites which slightly improved prediction of T2D when used in addition to traditional T2D risk factors and also when used together with classical biomarkers. The identified metabolites partly agreed with previous findings using Cox regression, though RSF selected a higher number of highly correlated metabolites. Conclusions: The RSF method appeared to be a promising approach for identification of disease-associated variables in complex data with time to event as outcome. The demonstrated RSF approach provides comparable findings as the generally used Cox regression, but also addresses the problem of multicollinearity and is suitable for high-dimensional data.
引用
收藏
页码:1406 / 1420
页数:15
相关论文
共 62 条
  • [1] DIAGNOSTICS The prostate-cancer metabolome
    Abate-Shen, Cory
    Shen, Michael M.
    [J]. NATURE, 2009, 457 (7231) : 799 - 800
  • [2] 7. Approaches to Glycemic Treatment
    不详
    [J]. DIABETES CARE, 2016, 39 : S52 - S59
  • [3] Metabonomic Variations in the Drug-Treated Type 2 Diabetes Mellitus Patients and Healthy Volunteers
    Bao, Yuqian
    Zhao, Tie
    Wang, Xiaoyan
    Qiu, Yunping
    Su, Mingming
    Jia, Weiping
    Jia, Wei
    [J]. JOURNAL OF PROTEOME RESEARCH, 2009, 8 (04) : 1623 - 1630
  • [4] Metabolomic Profiling for Identification of Novel Potential Biomarkers in Cardiovascular Diseases
    Barderas, Maria G.
    Laborde, Carlos M.
    Posada, Maria
    de la Cuesta, Fernando
    Zubiri, Irene
    Vivanco, Fernando
    Alvarez-Llamas, Gloria
    [J]. JOURNAL OF BIOMEDICINE AND BIOTECHNOLOGY, 2011,
  • [5] Analysis of case-cohort designs
    Barlow, WE
    Ichikawa, L
    Rosner, D
    Izumi, S
    [J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 1999, 52 (12) : 1165 - 1172
  • [6] Adjusting for multiple testing - when and how?
    Bender, R
    Lange, S
    [J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 2001, 54 (04) : 343 - 349
  • [7] Follow-up procedures in EPIC-Germany - Data quality aspects
    Bergmann, MM
    Bussas, U
    Boeing, H
    [J]. ANNALS OF NUTRITION AND METABOLISM, 1999, 43 (04) : 225 - 234
  • [8] Recruitment procedures of EPIC-Germany
    Boeing, H
    Korfmann, A
    Bergmann, MM
    [J]. ANNALS OF NUTRITION AND METABOLISM, 1999, 43 (04) : 205 - 215
  • [9] EPIC-Germany - A source for studies into diet and risk of chronic diseases
    Boeing, H
    Wahrendorf, J
    Becker, N
    [J]. ANNALS OF NUTRITION AND METABOLISM, 1999, 43 (04) : 195 - 204
  • [10] Statistical strategies for avoiding false discoveries in metabolomics and related experiments
    Broadhurst, David I.
    Kell, Douglas B.
    [J]. METABOLOMICS, 2006, 2 (04) : 171 - 196