Bayesian shrinkage models for integration and analysis of multiplatform high-dimensional genomics data

被引:1
|
作者
Xue, Hao [1 ]
Chakraborty, Sounak [2 ,4 ]
Dey, Tanujit [3 ]
机构
[1] Cornell Univ, Dept Computat Biol, Ithaca, NY USA
[2] Univ Missouri, Dept Stat, Columbia, MO USA
[3] Harvard Med Sch, Brigham & Womens Hosp, Ctr Surg & Publ Hlth, Dept Surg, Boston, MA USA
[4] Univ Missouri, Dept Stat, C209F Middlebush Hall, Columbia, MO 65211 USA
关键词
data integration; Expectation Maximization; glioblastoma; hierarchical Bayesian model; multiomics; VARIABLE SELECTION; DNA METHYLATION; PENALIZED LIKELIHOOD; GLIOBLASTOMA; EXPRESSION; INTERLEUKIN-8;
D O I
10.1002/sam.11682
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the increasing availability of biomedical data from multiple platforms of the same patients in clinical research, such as epigenomics, gene expression, and clinical features, there is a growing need for statistical methods that can jointly analyze data from different platforms to provide complementary information for clinical studies. In this paper, we propose a two-stage hierarchical Bayesian model that integrates high-dimensional biomedical data from diverse platforms to select biomarkers associated with clinical outcomes of interest. In the first stage, we use Expectation Maximization-based approach to learn the regulating mechanism between epigenomics (e.g., gene methylation) and gene expression while considering functional gene annotations. In the second stage, we group genes based on the regulating mechanism learned in the first stage. Then, we apply a group-wise penalty to select genes significantly associated with clinical outcomes while incorporating clinical features. Simulation studies suggest that our model-based data integration method shows lower false positives in selecting predictive variables compared with existing method. Moreover, real data analysis based on a glioblastoma (GBM) dataset reveals our method's potential to detect genes associated with GBM survival with higher accuracy than the existing method. Moreover, most of the selected biomarkers are crucial in GBM prognosis as confirmed by existing literature.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] Shrinkage and LASSO strategies in high-dimensional heteroscedastic models
    Nkurunziza, Severien
    Al-Momani, Marwan
    Lin, Eric Yu Yin
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2016, 45 (15) : 4454 - 4470
  • [22] Graph-guided Bayesian SVM with Adaptive Structured Shrinkage Prior for High-dimensional Data
    Sun, Wenli
    Chang, Changgee
    Long, Qi
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 4472 - 4479
  • [23] Bayesian model selection for high-dimensional Ising models, with applications to educational data
    Park, Jaewoo
    Jin, Ick Hoon
    Schweinberger, Michael
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2022, 165
  • [24] IMPROVING HIGH-DIMENSIONAL PHYSICS MODELS THROUGH BAYESIAN CALIBRATION WITH UNCERTAIN DATA
    Kumar, Natarajan Chennimalai
    Subramaniyan, Arun K.
    Wang, Liping
    PROCEEDINGS OF THE ASME TURBO EXPO 2012, VOL 7, PTS A AND B, 2012, : 407 - +
  • [25] Rejoinder to 'Post-selection shrinkage estimation for high-dimensional data analysis'
    Gao, Xiaoli
    Ahmed, S. Ejaz
    Feng, Yang
    APPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY, 2017, 33 (02) : 131 - 135
  • [26] High-Dimensional Bayesian Semiparametric Models for Small Samples: A Principled Approach to the Analysis of Cytokine Expression Data
    Poli, Giovanni
    Argiento, Raffaele
    Amedei, Amedeo
    Stingo, Francesco C.
    BIOMETRICAL JOURNAL, 2024, 66 (08)
  • [27] Shrinkage Estimation of High-Dimensional Factor Models with Structural Instabilities
    Cheng, Xu
    Liao, Zhipeng
    Schorfheide, Frank
    REVIEW OF ECONOMIC STUDIES, 2016, 83 (04): : 1511 - 1543
  • [28] Shrinkage Ridge Regression Estimators in High-Dimensional Linear Models
    Yuzbasi, Bahadir
    Ahmed, S. Ejaz
    PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE AND ENGINEERING MANAGEMENT, 2015, 362 : 793 - 807
  • [29] Robust and sparse correlation matrix estimation for the analysis of high-dimensional genomics data
    Serra, Angela
    Coretto, Pietro
    Fratello, Michele
    Tagliaferri, Roberto
    BIOINFORMATICS, 2018, 34 (04) : 625 - 634
  • [30] High-dimensional Bayesian optimization with a combination of Kriging models
    Appriou, Tanguy
    Rulliere, Didier
    Gaudrie, David
    STRUCTURAL AND MULTIDISCIPLINARY OPTIMIZATION, 2024, 67 (11)