Network-Constrained Group Lasso for High-Dimensional Multinomial Classification with Application to Cancer Subtype Prediction

被引:0
作者
Tian, Xinyu [1 ]
Wang, Xuefeng [2 ]
Chen, Jun [1 ,3 ]
机构
[1] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY USA
[2] SUNY Stony Brook, Dept Prevent Med, Stony Brook, NY USA
[3] Mayo Clin, Div Biomed Stat & Informat, Rochester, MN 55905 USA
关键词
cancer subtype prediction; multinomial logit model; group lasso; network-constraint; proximal gradient algorithm;
D O I
10.4137/CIN.s17686
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Classic multinomial logit model, commonly used in multiclass regression problem, is restricted to few predictors and does not take into account the relationship among variables. It has limited use for genomic data, where the number of genomic features far exceeds the sample size. Genomic features such as gene expressions are usually related by an underlying biological network. Efficient use of the network information is important to improve classification performance as well as the biological interpretability. We proposed a multinomial logit model that is capable of addressing both the high dimensionality of predictors and the underlying network information. Group lasso was used to induce model sparsity, and a network-constraint was imposed to induce the smoothness of the coefficients with respect to the underlying network structure. To deal with the non-smoothness of the objective function in optimization, we developed a proximal gradient algorithm for efficient computation. The proposed model was compared to models with no prior structure information in both simulations and a problem of cancer subtype prediction with real TCGA (the cancer genome atlas) gene expression data. The network-constrained mode outperformed the traditional ones in both cases.
引用
收藏
页码:25 / 33
页数:9
相关论文
共 36 条
[1]   Fast Gradient-Based Algorithms for Constrained Total Variation Image Denoising and Deblurring Problems [J].
Beck, Amir ;
Teboulle, Marc .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2009, 18 (11) :2419-2434
[2]   A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems [J].
Beck, Amir ;
Teboulle, Marc .
SIAM JOURNAL ON IMAGING SCIENCES, 2009, 2 (01) :183-202
[3]  
Breheny P, 2013, STAT COMPUT, P1
[4]  
Cawley G. C., 2007, ADV NEURAL INFORM PR, P209
[5]   VARIABLE SELECTION FOR SPARSE DIRICHLET-MULTINOMIAL REGRESSION WITH AN APPLICATION TO MICROBIOME DATA ANALYSIS [J].
Chen, Jun ;
Li, Hongzhe .
ANNALS OF APPLIED STATISTICS, 2013, 7 (01) :418-442
[6]   Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis [J].
Chen, Jun ;
Bushman, Frederic D. ;
Lewis, James D. ;
Wu, Gary D. ;
Li, Hongzhe .
BIOSTATISTICS, 2013, 14 (02) :244-258
[7]   Associating microbiome composition with environmental covariates using generalized UniFrac distances [J].
Chen, Jun ;
Bittinger, Kyle ;
Charlson, Emily S. ;
Hoffmann, Christian ;
Lewis, James ;
Wu, Gary D. ;
Collman, Ronald G. ;
Bushman, Frederic D. ;
Li, Hongzhe .
BIOINFORMATICS, 2012, 28 (16) :2106-2113
[8]   Subnetwork State Functions Define Dysregulated Subnetworks in Cancer [J].
Chowdhury, Salim A. ;
Nibbe, Rod K. ;
Chance, Mark R. ;
Koyutuerk, Mehmet .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2011, 18 (03) :263-281
[9]   Network-based classification of breast cancer metastasis [J].
Chuang, Han-Yu ;
Lee, Eunjung ;
Liu, Yu-Tsueng ;
Lee, Doheon ;
Ideker, Trey .
MOLECULAR SYSTEMS BIOLOGY, 2007, 3 (1)
[10]   Prognostic gene signatures for patient stratification in breast cancer - accuracy, stability and interpretability of gene selection approaches using prior knowledge on protein-protein interactions [J].
Cun, Yupeng ;
Froehlich, Holger .
BMC BIOINFORMATICS, 2012, 13