Generalized Bayesian Factor Analysis for Integrative Clustering with Applications to Multi-Omics Data

被引:15
作者
Min, Eun Jeong [1 ]
Chang, Changgee [1 ]
Long, Qi [1 ]
机构
[1] Univ Penn, Dept Biostat Epidemiol & Informat, Philadelpia, PA 19104 USA
来源
2018 IEEE 5TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA) | 2018年
关键词
Generalized Bayesian Factor Analysis; Markov Random Field (MRF); Spike and Slab Lasso (SSL); Variational EM Algorithm; Structural Information; Network Information; Integrative Analysis; Integrative Clustering; High Dimensional Data; Omics Data; NCI60; LATENT VARIABLE MODEL; EXPRESSION PROFILES; SELECTION; MELANOGENESIS; INFORMATION; TRANSCRIPT; REGRESSION; DISCOVERY; INFERENCE; PATHWAYS;
D O I
10.1109/DSAA.2018.00021
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Integrative clustering is a clustering approach for multiple datasets, which provide different views of a common group of subjects. It enables analyzing multi-omics data jointly to, for example, identify the subtypes of diseases, cells, and so on, capturing the complex underlying biological processes more precisely. On the other hand, there has been a great deal of interest in incorporating the prior structural knowledge on the features into statistical analyses over the past decade. The knowledge on the gene regulatory network (pathways) can potentially be incorporated into many genomic studies. In this paper, we propose a novel integrative clustering method which can incorporate the prior graph knowledge. We first develop a generalized Bayesian factor analysis (GBFA) framework, a sparse Bayesian factor analysis which can take into account the graph information. Our GBFA framework employs the spike and slab lasso (SSL) prior to impose sparsity on the factor loadings and the Markov random field (MRF) prior to encourage smoothing over the adjacent factor loadings, which establishes a unified shrinkage adaptive to the loading size and the graph structure. Then, we use the framework to extend iCluster+, a factor analysis based integrative clustering approach. A novel variational EM algorithm is proposed to efficiently estimate the MAP estimator for the factor loadings. Extensive simulation studies and the application to the NCI60 cell line dataset demonstrate that the propose method is superior and delivers more biologically meaningful outcomes.
引用
收藏
页码:109 / 119
页数:11
相关论文
共 47 条
[1]  
[Anonymous], 1988, P 1988 CONNECTIONIST
[2]  
[Anonymous], 1908, Bull. Soc. Vaud. Sci. Nat.
[3]   Variational Inference: A Review for Statisticians [J].
Blei, David M. ;
Kucukelbir, Alp ;
McAuliffe, Jon D. .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2017, 112 (518) :859-877
[4]   A Direct Estimation Approach to Sparse Linear Discriminant Analysis [J].
Cai, Tony ;
Liu, Weidong .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2011, 106 (496) :1566-1577
[5]   Integrative clustering of multi-level 'omic data based on non-negative matrix factorization algorithm [J].
Chalise, Prabhakar ;
Fridley, Brooke L. .
PLOS ONE, 2017, 12 (05)
[6]  
Chang C., BIOMETRICS
[7]   Estimation of covariance matrix via the sparse Cholesky factor with lasso [J].
Chang, Changgee ;
Tsay, Ruey S. .
JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2010, 140 (12) :3858-3873
[8]   ToppGene Suite for gene list enrichment analysis and candidate gene prioritization [J].
Chen, Jing ;
Bardes, Eric E. ;
Aronow, Bruce J. ;
Jegga, Anil G. .
NUCLEIC ACIDS RESEARCH, 2009, 37 :W305-W311
[9]   Neurotrophin signaling in cancer stem cells [J].
Chopin, Valerie ;
Lagadec, Chann ;
Toillon, Robert-Alain ;
Le Bourhis, Xuefen .
CELLULAR AND MOLECULAR LIFE SCIENCES, 2016, 73 (09) :1859-1870
[10]   The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups [J].
Curtis, Christina ;
Shah, Sohrab P. ;
Chin, Suet-Feung ;
Turashvili, Gulisa ;
Rueda, Oscar M. ;
Dunning, Mark J. ;
Speed, Doug ;
Lynch, Andy G. ;
Samarajiwa, Shamith ;
Yuan, Yinyin ;
Graef, Stefan ;
Ha, Gavin ;
Haffari, Gholamreza ;
Bashashati, Ali ;
Russell, Roslin ;
McKinney, Steven ;
Langerod, Anita ;
Green, Andrew ;
Provenzano, Elena ;
Wishart, Gordon ;
Pinder, Sarah ;
Watson, Peter ;
Markowetz, Florian ;
Murphy, Leigh ;
Ellis, Ian ;
Purushotham, Arnie ;
Borresen-Dale, Anne-Lise ;
Brenton, James D. ;
Tavare, Simon ;
Caldas, Carlos ;
Aparicio, Samuel .
NATURE, 2012, 486 (7403) :346-352