TESTING HIGH-DIMENSIONAL COVARIANCE MATRICES, WITH APPLICATION TO DETECTING SCHIZOPHRENIA RISK GENES

被引:26
作者
Zhu, Lingxue [1 ]
Lei, Jing [1 ]
Devlin, Bernie [2 ]
Roeder, Kathryn [1 ]
机构
[1] Carnegie Mellon Univ, Dept Stat, 5000 Forbes Ave, Pittsburgh, PA 15213 USA
[2] Univ Pittsburgh, Sch Med, Dept Psychiat & Human Genet, 3811 Ohara St, Pittsburgh, PA 15213 USA
基金
美国国家科学基金会;
关键词
Permutation test; high-dimensional data; covariance matrix; sparse principal component analysis; SPARSE PRINCIPAL COMPONENTS; EQUALITY; GROWTH;
D O I
10.1214/17-AOAS1062
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Scientists routinely compare gene expression levels in cases versus controls in part to determine genes associated with a disease. Similarly, detecting case-control differences in co-expression among genes can be critical to understanding complex human diseases; however, statistical methods have been limited by the high-dimensional nature of this problem. In this paper, we construct a sparse-Leading-Eigenvalue- Driven (sLED) test for comparing two high-dimensional covariance matrices. By focusing on the spectrum of the differential matrix, sLED provides a novel perspective that accommodates what we assume to be common, namely sparse and weak signals in gene expression data, and it is closely related with sparse principal component analysis. We prove that sLED achieves full power asymptotically under mild assumptions, and simulation studies verify that it outperforms other existing procedures under many biologically plausible scenarios. Applying sLED to the largest gene-expression dataset obtained from post-mortem brain tissue from Schizophrenia patients and controls, we provide a novel list of genes implicated in Schizophrenia and reveal intriguing patterns in gene co-expression change for Schizophrenia subjects. We also illustrate that sLED can be generalized to compare other gene-gene "relationship" matrices that are of practical interest, such as the weighted adjacency matrices.
引用
收藏
页码:1810 / 1831
页数:22
相关论文
共 31 条
[1]  
Anderson T. W., 1958, An Introduction to Multivariate Statistical Analysis
[2]  
[Anonymous], 2013, Advances in neural information processing systems
[3]   CORRECTIONS TO LRT ON LARGE-DIMENSIONAL COVARIANCE MATRIX BY RMT [J].
Bai, Zhidong ;
Jiang, Dandan ;
Yao, Jian-Feng ;
Zheng, Shurong .
ANNALS OF STATISTICS, 2009, 37 (6B) :3822-3840
[4]   OPTIMAL DETECTION OF SPARSE PRINCIPAL COMPONENTS IN HIGH DIMENSION [J].
Berthet, Quentin ;
Rigollet, Philippe .
ANNALS OF STATISTICS, 2013, 41 (04) :1780-1815
[5]   Inference for high-dimensional differential correlation matrices [J].
Cai, T. Tony ;
Zhang, Anru .
JOURNAL OF MULTIVARIATE ANALYSIS, 2016, 143 :107-126
[6]   Two-Sample Covariance Matrix Testing and Support Recovery in High-Dimensional and Sparse Settings [J].
Cai, Tony ;
Liu, Weidong ;
Xia, Yin .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2013, 108 (501) :265-277
[7]  
CHANG J., 2016, ARXIV150504493V3
[8]   Enrichr: interactive and collaborative HTML']HTML5 gene list enrichment analysis tool [J].
Chen, Edward Y. ;
Tan, Christopher M. ;
Kou, Yan ;
Duan, Qiaonan ;
Wang, Zichen ;
Meirelles, Gabriela Vaz ;
Clark, Neil R. ;
Ma'ayan, Avi .
BMC BIOINFORMATICS, 2013, 14
[9]   A direct formulation for sparse PCA using semidefinite programming [J].
d'Aspremont, Alexandre ;
El Ghaoui, Laurent ;
Jordan, Michael I. ;
Lanckriet, Gert R. G. .
SIAM REVIEW, 2007, 49 (03) :434-448
[10]   Astrocyte-induced Synaptogenesis Is Mediated by Transforming Growth Factor β Signaling through Modulation of D-Serine Levels in Cerebral Cortex Neurons [J].
Diniz, Luan Pereira ;
Almeida, Juliana Carvalho ;
Tortelli, Vanessa ;
Lopes, Charles Vargas ;
Setti-Perdigao, Pedro ;
Stipursky, Joice ;
Kahn, Suzana Assad ;
Romao, Luciana Ferreira ;
de Miranda, Joari ;
Alves-Leon, Soniza Vieira ;
de Souza, Jorge Marcondes ;
Castro, Newton G. ;
Panizzutti, Rogerio ;
Alcantara Gomes, Flavia Carvalho .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2012, 287 (49) :41432-41445