Supervised group Lasso with applications to microarray data analysis

被引:155
作者
Ma, Shuangge [1 ]
Song, Xiao
Huang, Jian
机构
[1] Yale Univ, Dept Epidemiol & Publ Hlth, New Haven, CT 06520 USA
[2] Univ Georgia, Dept Hlth Adm Biostat & Epidemiol, Athens, GA 30602 USA
[3] Yale Univ, Dept Epidemiol & Publ Hlth, New Haven, CT 06520 USA
关键词
D O I
10.1186/1471-2105-8-60
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: A tremendous amount of efforts have been devoted to identifying genes for diagnosis and prognosis of diseases using microarray gene expression data. It has been demonstrated that gene expression data have cluster structure, where the clusters consist of coregulated genes which tend to have coordinated functions. However, most available statistical methods for gene selection do not take into consideration the cluster structure. Results: We propose a supervised group Lasso approach that takes into account the cluster structure in gene expression data for gene selection and predictive model building. For gene expression data without biological cluster information, we first divide genes into clusters using the K-means approach and determine the optimal number of clusters using the Gap method. The supervised group Lasso consists of two steps. In the first step, we identify important genes within each cluster using the Lasso method. In the second step, we select important clusters using the group Lasso. Tuning parameters are determined using V-fold cross validation at both steps to allow for further flexibility. Prediction performance is evaluated using leave-one-out cross validation. We apply the proposed method to disease classification and survival analysis with microarray data. Conclusion: We analyze four microarray data sets using the proposed approach: two cancer data sets with binary cancer occurrence as outcomes and two lymphoma data sets with survival outcomes. The results show that the proposed approach is capable of identifying a small number of influential gene clusters and important genes within those clusters, and has better prediction performance than existing methods.
引用
收藏
页数:17
相关论文
共 38 条
  • [1] Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling
    Alizadeh, AA
    Eisen, MB
    Davis, RE
    Ma, C
    Lossos, IS
    Rosenwald, A
    Boldrick, JG
    Sabet, H
    Tran, T
    Yu, X
    Powell, JI
    Yang, LM
    Marti, GE
    Moore, T
    Hudson, J
    Lu, LS
    Lewis, DB
    Tibshirani, R
    Sherlock, G
    Chan, WC
    Greiner, TC
    Weisenburger, DD
    Armitage, JO
    Warnke, R
    Levy, R
    Wilson, W
    Grever, MR
    Byrd, JC
    Botstein, D
    Brown, PO
    Staudt, LM
    [J]. NATURE, 2000, 403 (6769) : 503 - 511
  • [2] Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays
    Alon, U
    Barkai, N
    Notterman, DA
    Gish, K
    Ybarra, S
    Mack, D
    Levine, AJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) : 6745 - 6750
  • [3] Prediction by supervised principal components
    Bair, E
    Hastie, T
    Paul, D
    Tibshirani, R
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2006, 101 (473) : 119 - 137
  • [4] COX DR, 1972, J R STAT SOC B, V34, P187
  • [5] Improved statistical tests for differential gene expression by shrinking variance components estimates
    Cui, XG
    Hwang, JTG
    Qiu, J
    Blades, NJ
    Churchill, GA
    [J]. BIOSTATISTICS, 2005, 6 (01) : 59 - 75
  • [6] Prediction of survival in follicular lymphoma based on molecular features of tumor-infiltrating immune cells
    Dave, SS
    Wright, G
    Tan, B
    Rosenwald, A
    Gascoyne, RD
    Chan, WC
    Fisher, RI
    Braziel, RM
    Rimsza, LM
    Grogan, TM
    Miller, TP
    LeBlanc, M
    Greiner, TC
    Weisenburger, DD
    Lynch, JC
    Vose, J
    Armitage, JO
    Smeland, EB
    Kvaloy, S
    Holte, H
    Delabie, J
    Connors, JM
    Lansdorp, PM
    Ouyang, Q
    Lister, TA
    Davies, AJ
    Norton, AJ
    Muller-Hermelink, HK
    Ott, G
    Campo, E
    Montserrat, E
    Wilson, WH
    Jaffe, ES
    Simon, R
    Yang, LM
    Powell, J
    Zhao, H
    Goldschmidt, N
    Chiorazzi, M
    Staudt, LM
    [J]. NEW ENGLAND JOURNAL OF MEDICINE, 2004, 351 (21) : 2159 - 2169
  • [7] Boosting for tumor classification with gene expression data
    Dettling, M
    Bühlmann, P
    [J]. BIOINFORMATICS, 2003, 19 (09) : 1061 - 1069
  • [8] DUDOIT S, 2002, JASA, V97, P774
  • [9] EFRON B, 2006, UNPUB TESTING SIGNIF
  • [10] FRIEDMAN J, 2006, UNPUB HERDING LAMDAS