Random forests for high-dimensional longitudinal data

被引:67
作者
Capitaine, Louis [1 ]
Genuer, Robin [1 ]
Thiebaut, Rodolphe [1 ]
机构
[1] Bordeaux Univ, Bordeaux Populat Hlth Res Ctr, INRIA Bordeaux Sud Ouest, SISTM Team,INSERM,U1219, Bordeaux, France
关键词
Stochastic mixed effects model; tree-based methods; high-dimensional data; repeated measurements; VARIABLE SELECTION; CLASSIFICATION; MODELS;
D O I
10.1177/0962280220946080
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Random forests are one of the state-of-the-art supervised machine learning methods and achieve good performance in high-dimensional settings wherep, the number of predictors, is much larger thann, the number of observations. Repeated measurements provide, in general, additional information, hence they are worth accounted especially when analyzing high-dimensional data. Tree-based methods have already been adapted to clustered and longitudinal data by using a semi-parametric mixed effects model, in which the non-parametric part is estimated using regression trees or random forests. We propose a general approach of random forests for high-dimensional longitudinal data. It includes a flexible stochastic model which allows the covariance structure to vary over time. Furthermore, we introduce a new method which takes intra-individual covariance into consideration to build random forests. Through simulation experiments, we then study the behavior of different estimation methods, especially in the context of high-dimensional data. Finally, the proposed method has been applied to an HIV vaccine trial including 17 HIV-infected patients with 10 repeated measurements of 20,000 gene transcripts and blood concentration of human immunodeficiency virus RNA. The approach selected 21 gene transcripts for which the association with HIV viral load was fully relevant and consistent with results observed during primary infection.
引用
收藏
页码:166 / 184
页数:19
相关论文
共 42 条
[1]  
[Anonymous], ARXIV08113619
[2]   A random forest guided tour [J].
Biau, Gerard ;
Scornet, Erwan .
TEST, 2016, 25 (02) :197-227
[3]   Global genomic analysis reveals rapid control of a robust innate response in SIV-infected sooty mangabeys [J].
Bosinger, Steven E. ;
Li, Qingsheng ;
Gordon, Shari N. ;
Klatt, Nichole R. ;
Duan, Lijie ;
Xu, Luoling ;
Francella, Nicholas ;
Sidahmed, Abubaker ;
Smith, Anthony J. ;
Cramer, Elizabeth M. ;
Zeng, Ming ;
Masopust, David ;
Carlis, John V. ;
Ran, Longsi ;
Vanderford, Thomas H. ;
Paiardini, Mirko ;
Isett, R. Benjamin ;
Baldwin, Don A. ;
Else, James G. ;
Staprans, Silvija I. ;
Silvestri, Guido ;
Haase, Ashley T. ;
Kelvin, David J. .
JOURNAL OF CLINICAL INVESTIGATION, 2009, 119 (12) :3556-3572
[4]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[5]  
Breiman L., 1984, Classification and Regression Trees, DOI DOI 10.1201/9781315139470
[6]   Repeated measures random forests (RMRF): Identifying factors associated with nocturnal hypoglycemia [J].
Calhoun, Peter ;
Levine, Richard A. ;
Fan, Juanjuan .
BIOMETRICS, 2021, 77 (01) :343-351
[7]   Democratizing systems immunology with modular transcriptional repertoire analyses [J].
Chaussabel, Damien ;
Baldwin, Nicole .
NATURE REVIEWS IMMUNOLOGY, 2014, 14 (04) :271-280
[8]   Random forests for genomic data analysis [J].
Chen, Xi ;
Ishwaran, Hemant .
GENOMICS, 2012, 99 (06) :323-329
[9]   Bayesian CART model search [J].
Chipman, HA ;
George, EI ;
McCulloch, RE .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1998, 93 (443) :935-948
[10]   Random forests for classification in ecology [J].
Cutler, D. Richard ;
Edwards, Thomas C., Jr. ;
Beard, Karen H. ;
Cutler, Adele ;
Hess, Kyle T. .
ECOLOGY, 2007, 88 (11) :2783-2792