Sparse Canonical Covariance Analysis for High-throughput Data

被引:19
作者
Lee, Woojoo [1 ]
Lee, Donghwan [2 ]
Lee, Youngjo [2 ]
Pawitan, Yudi [1 ]
机构
[1] Karolinska Inst, Dept Med Epidemiol & Biostat, Stockholm, Sweden
[2] Seoul Natl Univ, Coll Nat Sci, Dept Stat, Seoul 151742, South Korea
关键词
canonical covariance analysis; sparsity; random-effect model; high-dimensional genomic data; VARIABLE SELECTION; REGRESSION; SHRINKAGE; GENE;
D O I
10.2202/1544-6115.1638
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Canonical covariance analysis (CCA) has gained popularity as a method for the analysis of two sets of high-dimensional genomic data. However, it is often difficult to interpret the results because canonical vectors are linear combinations of all variables, and the coefficients are typically nonzero. Several sparse CCA methods have recently been proposed for reducing the number of nonzero coefficients, but these existing methods are not satisfactory because they still give too many nonzero coefficients. In this paper, we propose a new random-effect model approach for sparse CCA; the proposed algorithm can adapt arbitrary penalty functions to CCA without much computational demands. Through simulation studies, we compare various penalty functions in terms of the performance of correct model identification. We also develop an extension of sparse CCA to address more than two sets of variables on the same set of observations. We illustrate the method with an analysis of the NCI cancer dataset.
引用
收藏
页数:25
相关论文
共 32 条
[1]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[2]   Sparse partial least squares regression for simultaneous dimension reduction and variable selection [J].
Chun, Hyonho ;
Keles, Suenduez .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2010, 72 :3-25
[3]  
Clemmensen L, 2008, SPARSE DISCRIMINANT
[4]  
DEMSPTER AP, 1972, BIOMETRICS, V28, P157
[5]   Variable selection via nonconcave penalized likelihood and its oracle properties [J].
Fan, JQ ;
Li, RZ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) :1348-1360
[6]  
Gifi A., 1990, NONLINEAR MULTIVARIA
[7]  
Hoskuldsson A., 1988, J CHEMOMETR, V2, P211, DOI [10.1002/cem.1180020306, DOI 10.1002/CEM.1180020306]
[8]   Variable selection using MM algorithms [J].
Hunter, DR ;
Li, RZ .
ANNALS OF STATISTICS, 2005, 33 (04) :1617-1642
[9]  
Johnson, 2013, MATRIX ANAL
[10]   On Consistency and Sparsity for Principal Components Analysis in High Dimensions [J].
Johnstone, Iain M. ;
Lu, Arthur Yu .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2009, 104 (486) :682-693