Bayesian network-driven clustering analysis with feature selection for high-dimensional multi-modal molecular data

被引:2
|
作者
Zhao, Yize [1 ]
Chang, Changgee [2 ]
Hannum, Margaret [3 ]
Lee, Jasme [3 ]
Shen, Ronglai [3 ]
机构
[1] Yale Univ, Dept Biostat, New Haven, CT 06511 USA
[2] Univ Penn, Dept Biostat Epidemiol & Informat, Perelman Sch Med, Philadelphia, PA 19104 USA
[3] Mem Sloan Kettering Canc Ctr, Dept Epidemiol & Biostat, New York, NY 10021 USA
关键词
BASKET TRIALS; CELL; TUMORS; IDENTIFICATION;
D O I
10.1038/s41598-021-84514-0
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Multi-modal molecular profiling data in bulk tumors or single cells are accumulating at a fast pace. There is a great need for developing statistical and computational methods to reveal molecular structures in complex data types toward biological discoveries. Here, we introduce Nebula, a novel Bayesian integrative clustering analysis for high dimensional multi-modal molecular data to identify directly interpretable clusters and associated biomarkers in a unified and biologically plausible framework. To facilitate computational efficiency, a variational Bayes approach is developed to approximate the joint posterior distribution to achieve model inference in high-dimensional settings. We describe a pan-cancer data analysis of genomic, epigenomic, and transcriptomic alterations in close to 9000 tumor samples across canonical oncogenic signaling pathways, immune and stemness phenotype, with comparisons to state-of-the-art clustering methods. We demonstrate that Nebula has the unique advantage of revealing patterns on the basis of shared pathway alterations, offering biological and clinical insights beyond tumor type and histology in the pan-cancer analysis setting. We also illustrate the utility of Nebula in single cell data for immune cell decomposition in peripheral blood samples.
引用
收藏
页数:11
相关论文
共 11 条
  • [1] Multistage feature selection approach for high-dimensional cancer data
    Alkuhlani, Alhasan
    Nassef, Mohammad
    Farag, Ibrahim
    SOFT COMPUTING, 2017, 21 (22) : 6895 - 6906
  • [2] Ultra High-Dimensional Nonlinear Feature Selection for Big Biological Data
    Yamada, Makoto
    Tang, Jiliang
    Lugo-Martinez, Jose
    Hodzic, Ermin
    Shrestha, Raunak
    Saha, Avishek
    Ouyang, Hua
    Yin, Dawei
    Mamitsuka, Hiroshi
    Sahinalp, Cenk
    Radivojac, Predrag
    Menczer, Filippo
    Chang, Yi
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (07) : 1352 - 1365
  • [3] The EBIC and a sequential procedure for feature selection in interactive linear models with high-dimensional data
    He, Yawei
    Chen, Zehua
    ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2016, 68 (01) : 155 - 180
  • [4] Outcome-guided Bayesian clustering for disease subtype discovery using high-dimensional transcriptomic data
    Meng, Lingsong
    Huo, Zhiguang
    JOURNAL OF APPLIED STATISTICS, 2025, 52 (01) : 183 - 207
  • [5] Robust twin boosting for feature selection from high-dimensional omics data with label noise
    He, Shan
    Chen, Huanhuan
    Zhu, Zexuan
    Ward, Douglas G.
    Cooper, Helen J.
    Viant, Mark R.
    Heath, John K.
    Yao, Xin
    INFORMATION SCIENCES, 2015, 291 : 1 - 18
  • [6] Large-Scale Automatic Feature Selection for Biomarker Discovery in High-Dimensional OMICs Data
    Leclercq, Mickael
    Vittrant, Benjamin
    Martin-Magniette, Marie Laure
    Boyer, Marie Pier Scott
    Perin, Olivier
    Bergeron, Alain
    Fradet, Yves
    Droit, Arnaud
    FRONTIERS IN GENETICS, 2019, 10
  • [7] A Bayesian high-dimensional mediation analysis for multilevel genome-wide epigenetic data
    Qiao, Xi
    Ngo, Duy
    Straight, Bilinda
    Needham, Belinda L.
    Hilton, Charles E.
    Naugle, Amy
    JOURNAL OF APPLIED STATISTICS, 2025, 52 (02) : 287 - 305
  • [8] Iterative Bayesian Model Averaging: a method for the application of survival analysis to high-dimensional microarray data
    Annest, Amalia
    Bumgarner, Roger E.
    Raftery, Adrian E.
    Yeung, Ka Yee
    BMC BIOINFORMATICS, 2009, 10
  • [9] Single Cell Proteomics for Molecular Targets in Lung Cancer: High-Dimensional Data Acquisition and Analysis
    Wang, Zheng
    Zhang, Xiaoju
    SINGLE CELL BIOMEDICINE, 2018, 1068 : 73 - 87
  • [10] Convergence analysis of sparse TSK fuzzy systems based on spectral Dai-Yuan conjugate gradient and application to high-dimensional feature selection
    Ji, Deqing
    Fan, Qinwei
    Dong, Qingmei
    Liu, Yunlong
    NEURAL NETWORKS, 2024, 179