Priority-Lasso: a simple hierarchical approach to the prediction of clinical outcome using multi-omics data

被引:43
作者
Klau, Simon [1 ]
Jurinovic, Vindi [1 ]
Hornung, Roman [1 ]
Herold, Tobias [2 ]
Boulesteix, Anne-Laure [1 ]
机构
[1] Univ Munich, Inst Med Informat Proc Biometry & Epidemiol, Munich, Germany
[2] Univ Munich, Dept Internal Med 3, Munich, Germany
关键词
Cox regression; Lasso; Multi-omics data; Penalized regression; Prediction model; Priority-lasso; ACUTE MYELOID-LEUKEMIA; REGULARIZATION PATHS; VARIABLE SELECTION; CLASSIFICATION; SURVIVAL; MODELS; RISK; VALIDATION; THERAPY;
D O I
10.1186/s12859-018-2344-6
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The inclusion of high-dimensional omics data in prediction models has become a well-studied topic in the last decades. Although most of these methods do not account for possibly different types of variables in the set of covariates available in the same dataset, there are many such scenarios where the variables can be structured in blocks of different types, e.g., clinical, transcriptomic, and methylation data. To date, there exist a few computationally intensive approaches that make use of block structures of this kind. Results: In this paper we present priority-Lasso, an intuitive and practical analysis strategy for building prediction models based on Lasso that takes such block structures into account. It requires the definition of a priority order of blocks of data. Lasso models are calculated successively for every block and the fitted values of every step are included as an offset in the fit of the next step. We apply priority-Lasso in different settings on an acute myeloid leukemia (AML) dataset consisting of clinical variables, cytogenetics, gene mutations and expression variables, and compare its performance on an independent validation dataset to the performance of standard Lasso models. Conclusion: The results show that priority-Lasso is able to keep pace with Lasso in terms of prediction accuracy. Variables of blocks with higher priorities are favored over variables of blocks with lower priority, which results in easily usable and transportable models for clinical practice.
引用
收藏
页数:14
相关论文
共 32 条
[1]   TANDEM: a two-stage approach to maximize interpretability of drug response models based on multiple molecular data types [J].
Aben, Nanne ;
Vis, Daniel J. ;
Michaut, Magali ;
Wesseis, Lodewyk F. A. .
BIOINFORMATICS, 2016, 32 (17) :413-420
[2]  
Boulesteix AL, 2017, COMPUT MATH METHOD M, P1
[3]  
Boulesteix AL, 2018, BIOMETRICAL J, P1
[4]   Machine learning versus statistical modeling [J].
Boulesteix, Anne-Laure ;
Schmid, Matthias .
BIOMETRICAL JOURNAL, 2014, 56 (04) :588-593
[5]   Double induction containing either two courses or one course of high-dose cytarabine plus mitoxantrone and postremission therapy by either autologous stem-cell transplantation or by prolonged maintenance for acute myeloid leukemia [J].
Büchner, T ;
Berdel, WE ;
Schoch, C ;
Haferlach, T ;
Serve, HL ;
Kienast, J ;
Schnittger, S ;
Kern, W ;
Tchinda, J ;
Reichle, A ;
Lengfelder, E ;
Staib, P ;
Ludwig, WD ;
Aul, C ;
Eimermacher, H ;
Balleisen, L ;
Sauerland, MC ;
Heinecke, A ;
Wöermann, B ;
Hiddemann, W .
JOURNAL OF CLINICAL ONCOLOGY, 2006, 24 (16) :2480-2489
[6]   Age, not therapy intensity, determines outcomes of adults with acute myeloid leukemia [J].
Buechner, T. ;
Krug, U. O. ;
Gale, R. Peter ;
Heinecke, A. ;
Sauerland, M. C. ;
Haferlach, C. ;
Schnittger, S. ;
Haferlach, T. ;
Mueller-Tidow, C. ;
Stelljes, M. ;
Mesters, R. M. ;
Serve, H. L. ;
Braess, J. ;
Spiekermann, K. ;
Staib, P. ;
Grueneisen, A. ;
Reichle, A. ;
Balleisen, L. ;
Eimermacher, H. ;
Giagounidis, A. ;
Rasche, H. ;
Lengfelder, E. ;
Goerlich, D. ;
Faldum, A. ;
Koepcke, W. ;
Hehlmann, R. ;
Woermann, B. J. ;
Berdel, W. E. ;
Hiddemann, W. .
LEUKEMIA, 2016, 30 (08) :1781-1784
[7]  
COX DR, 1972, J R STAT SOC B, V34, P187
[8]   Diagnosis and management of AML in adults: 2017 ELN recommendations from an international expert panel [J].
Doehner, Hartmut ;
Estey, Elihu ;
Grimwade, David ;
Amadori, Sergio ;
Appelbaum, Frederick R. ;
Buechner, Thomas ;
Dombret, Herve ;
Ebert, Benjamin L. ;
Fenaux, Pierre ;
Larson, Richard A. ;
Levine, Ross L. ;
Lo-Coco, Francesco ;
Naoe, Tomoki ;
Niederwieser, Dietger ;
Ossenkoppele, Gert J. ;
Sanz, Miguel ;
Sierra, Jorge ;
Tallman, Martin S. ;
Tien, Hwei-Fang ;
Wei, Andrew H. ;
Lowenberg, Bob ;
Bloomfield, Clara D. .
BLOOD, 2017, 129 (04) :424-447
[9]   Regularization Paths for Generalized Linear Models via Coordinate Descent [J].
Friedman, Jerome ;
Hastie, Trevor ;
Tibshirani, Rob .
JOURNAL OF STATISTICAL SOFTWARE, 2010, 33 (01) :1-22
[10]  
Graf E, 1999, STAT MED, V18, P2529