Sparse multivariate factor analysis regression models and its applications to integrative genomics analysis

被引:8
作者
Zhou, Yan [1 ]
Wang, Pei [2 ]
Wang, Xianlong [3 ]
Zhu, Ji [4 ]
Song, Peter X. -K. [4 ]
机构
[1] Merck & Co Inc, N Wales, PA USA
[2] Icahn Sch Med Mt Sinai, New York, NY 10029 USA
[3] Fred Hutchinson Canc Res Ctr, 1124 Columbia St, Seattle, WA 98104 USA
[4] Univ Michigan, Ann Arbor, MI 48109 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
EM-blockwise coordinate descent; high-dimensional data; latent factors; regularization; COPY NUMBER ALTERATIONS; GENE-EXPRESSION; SELECTION; REVEALS; TARGET;
D O I
10.1002/gepi.22018
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The multivariate regression model is a useful tool to explore complex associations between two kinds of molecular markers, which enables the understanding of the biological pathways underlying disease etiology. For a set of correlated response variables, accounting for such dependency can increase statistical power. Motivated by integrative genomic data analyses, we propose a new methodologysparse multivariate factor analysis regression model (smFARM), in which correlations of response variables are assumed to follow a factor analysis model with latent factors. This proposed method not only allows us to address the challenge that the number of association parameters is larger than the sample size, but also to adjust for unobserved genetic and/or nongenetic factors that potentially conceal the underlying response-predictor associations. The proposed smFARM is implemented by the EM algorithm and the blockwise coordinate descent algorithm. The proposed methodology is evaluated and compared to the existing methods through extensive simulation studies. Our results show that accounting for latent factors through the proposed smFARM can improve sensitivity of signal detection and accuracy of sparse association map estimation. We illustrate smFARM by two integrative genomics analysis examples, a breast cancer dataset, and an ovarian cancer dataset, to assess the relationship between DNA copy numbers and gene expression arrays to understand genetic regulatory patterns relevant to the disease. We identify two trans-hub regions: one in cytoband 17q12 whose amplification influences the RNA expression levels of important breast cancer genes, and the other in cytoband 9q21.32-33, which is associated with chemoresistance in ovarian cancer.
引用
收藏
页码:70 / 80
页数:11
相关论文
共 46 条
[1]   Eigenvalue Ratio Test for the Number of Factors [J].
Ahn, Seung C. ;
Horenstein, Alex R. .
ECONOMETRICA, 2013, 81 (03) :1203-1227
[2]  
Akaike H., 1992, 2 INT S INF THEOR, P610, DOI [10.1007/978-1-4612-1694-0, 10.1007/978-1-4612-0919-538, 10.1007/978-1-4612-0919-5_38, 10.1007/978-0-387-98135-2, DOI 10.1007/978-1-4612-0919-538]
[3]   Singular value decomposition for genome-wide expression data processing and modeling [J].
Alter, O ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (18) :10101-10106
[4]  
[Anonymous], 2006, Journal of the Royal Statistical Society, Series B
[5]  
[Anonymous], 1956, Statistical inference in factor analysis
[6]  
[Anonymous], 2003, PRACTICAL APPROACH M, DOI [DOI 10.1007/0-306-47815-35, 10.1007/0-306-47815-35, DOI 10.1007/0-306-47815-3_5]
[7]   Determining the number of factors in approximate factor models [J].
Bai, JS ;
Ng, S .
ECONOMETRICA, 2002, 70 (01) :191-221
[8]   MODEL SELECTION FOR MULTIVARIATE REGRESSION IN SMALL SAMPLES [J].
BEDRICK, EJ ;
TSAI, CL .
BIOMETRICS, 1994, 50 (01) :226-231
[9]   Distinct patterns of DNA copy number alteration are associated with different clinicopathological features and gene-expression subtypes of breast cancer [J].
Bergamaschi, Anna ;
Kim, Young H. ;
Wang, Pei ;
Sorlie, Therese ;
Hernandez-Boussard, Tina ;
Lonning, Per E. ;
Tibshirani, Robert ;
Borresen-Dale, Anne-Lise ;
Pollack, Jonathan R. .
GENES CHROMOSOMES & CANCER, 2006, 45 (11) :1033-1040
[10]   A factor model to analyze heterogeneity in gene expression [J].
Blum, Yuna ;
Le Mignon, Guillaume ;
Lagarrigue, Sandrine ;
Causeur, David .
BMC BIOINFORMATICS, 2010, 11