Generalized Bayesian Factor Analysis for Integrative Clustering with Applications to Multi-Omics Data

被引:14
|
作者
Min, Eun Jeong [1 ]
Chang, Changgee [1 ]
Long, Qi [1 ]
机构
[1] Univ Penn, Dept Biostat Epidemiol & Informat, Philadelpia, PA 19104 USA
关键词
Generalized Bayesian Factor Analysis; Markov Random Field (MRF); Spike and Slab Lasso (SSL); Variational EM Algorithm; Structural Information; Network Information; Integrative Analysis; Integrative Clustering; High Dimensional Data; Omics Data; NCI60; LATENT VARIABLE MODEL; EXPRESSION PROFILES; SELECTION; MELANOGENESIS; INFORMATION; TRANSCRIPT; REGRESSION; DISCOVERY; INFERENCE; PATHWAYS;
D O I
10.1109/DSAA.2018.00021
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Integrative clustering is a clustering approach for multiple datasets, which provide different views of a common group of subjects. It enables analyzing multi-omics data jointly to, for example, identify the subtypes of diseases, cells, and so on, capturing the complex underlying biological processes more precisely. On the other hand, there has been a great deal of interest in incorporating the prior structural knowledge on the features into statistical analyses over the past decade. The knowledge on the gene regulatory network (pathways) can potentially be incorporated into many genomic studies. In this paper, we propose a novel integrative clustering method which can incorporate the prior graph knowledge. We first develop a generalized Bayesian factor analysis (GBFA) framework, a sparse Bayesian factor analysis which can take into account the graph information. Our GBFA framework employs the spike and slab lasso (SSL) prior to impose sparsity on the factor loadings and the Markov random field (MRF) prior to encourage smoothing over the adjacent factor loadings, which establishes a unified shrinkage adaptive to the loading size and the graph structure. Then, we use the framework to extend iCluster+, a factor analysis based integrative clustering approach. A novel variational EM algorithm is proposed to efficiently estimate the MAP estimator for the factor loadings. Extensive simulation studies and the application to the NCI60 cell line dataset demonstrate that the propose method is superior and delivers more biologically meaningful outcomes.
引用
收藏
页码:109 / 119
页数:11
相关论文
共 50 条
  • [41] Integrative analysis of multi-omics data for identifying multi-markers for diagnosing pancreatic cancer
    Min-Seok Kwon
    Yongkang Kim
    Seungyeoun Lee
    Junghyun Namkung
    Taegyun Yun
    Sung Gon Yi
    Sangjo Han
    Meejoo Kang
    Sun Whe Kim
    Jin-Young Jang
    Taesung Park
    BMC Genomics, 16
  • [42] Integrative, multi-omics, analysis of blood samples improves model predictions: applications to cancer
    Erica Ponzi
    Magne Thoresen
    Therese Haugdahl Nøst
    Kajsa Møllersen
    BMC Bioinformatics, 22
  • [43] Integrative, multi-omics, analysis of blood samples improves model predictions: applications to cancer
    Ponzi, Erica
    Thoresen, Magne
    Haugdahl Nost, Therese
    Mollersen, Kajsa
    BMC BIOINFORMATICS, 2021, 22 (01)
  • [44] A supervised Bayesian factor model for the identification of multi-omics signatures
    Gygi, Jeremy P.
    Konstorum, Anna
    Pawar, Shrikant
    Aron, Edel
    Kleinstein, Steven H.
    Guan, Leying
    BIOINFORMATICS, 2024, 40 (05)
  • [45] Integrative analysis of multi-omics data for identifying multi-markers for diagnosing pancreatic cancer
    Kwon, Min-Seok
    Kim, Yongkang
    Lee, Seungyeoun
    Namkung, Junghyun
    Yun, Taegyun
    Yi, Sung Gon
    Han, Sangjo
    Kang, Meejoo
    Kim, Sun Whe
    Jang, Jin-Young
    Park, Taesung
    BMC GENOMICS, 2015, 16
  • [46] Integrative Multi-Omics in Biomedical Research
    Hill, Michelle M.
    Gerner, Christopher
    BIOMOLECULES, 2021, 11 (10)
  • [47] Integrative Multi-Omics Through Bioinformatics
    Goh, Hoe-Han
    OMICS APPLICATIONS FOR SYSTEMS BIOLOGY, 2018, 1102 : 69 - 80
  • [48] Multiset correlation and factor analysis enables exploration of multi-omics data
    Brown, Brielin C.
    Wang, Collin
    Kasela, Silva
    Aguet, Francois
    Nachun, Daniel C.
    Taylor, Kent D.
    Tracy, Russell P.
    Durda, Peter
    Liu, Yongmei
    Johnson, W. Craig
    Van Den Berg, David
    Gupta, Namrata
    Gabriel, Stacy
    Smith, Joshua D.
    Gerzsten, Robert
    Clish, Clary
    Wong, Quenna
    Papanicolau, George
    Blackwell, Thomas W.
    Rotter, Jerome I.
    Rich, Stephen S.
    Barr, R. Graham
    Ardlie, Kristin G.
    Knowles, David A.
    Lappalainen, Tuuli
    CELL GENOMICS, 2023, 3 (08):
  • [49] Characterization of cancer subtypes associated with clinical outcomes by multi-omics integrative clustering
    Crippa, Valentina
    Malighetti, Federica
    Villa, Matteo
    Graudenzi, Alex
    Piazza, Rocco
    Mologni, Luca
    Ramazzotti, Daniele
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 162
  • [50] The benefits of smoking cessation on survival in cancer patients by integrative analysis of multi-omics data
    Yang, Sheng
    Liu, Tong
    Liang, Geyu
    MOLECULAR ONCOLOGY, 2020, 14 (09) : 2069 - 2080