Bi-EB: Empirical Bayesian Biclustering for Multi-Omics Data Integration Pattern Identification among Species

被引:2
|
作者
Yazdanparast, Aida [1 ,2 ,3 ]
Li, Lang [1 ,2 ,3 ,4 ]
Zhang, Chi [1 ,2 ]
Cheng, Lijun [4 ]
机构
[1] Indiana Univ, Sch Med, Ctr Computat Biol & Bioinformat, Indianapolis, IN 46202 USA
[2] Indiana Univ, Sch Informat, Dept Biohlth Informat, Indianapolis, IN 46202 USA
[3] Indiana Univ, Sch Med, Dept Med & Mol Genet, Indianapolis, IN 46202 USA
[4] Ohio State Univ, Coll Med, Dept Biomed Informat, Columbus, OH 43210 USA
关键词
biclustering; multi-omics data analysis; breast cancer; tumor and cancer cell lines; BREAST-CANCER; CELL-LINES; EXPRESSION; SUBTYPES; MODELS; DISCOVERY;
D O I
10.3390/genes13111982
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Although several biclustering algorithms have been studied, few are used for cross-pattern identification across species using multi-omics data mining. A fast empirical Bayesian biclustering (Bi-EB) algorithm is developed to detect the patterns shared from both integrated omics data and between species. The Bi-EB algorithm addresses the clinical critical translational question using the bioinformatics strategy, which addresses how modules of genotype variation associated with phenotype from cancer cell screening data can be identified and how these findings can be directly translated to a cancer patient subpopulation. Empirical Bayesian probabilistic interpretation and ratio strategy are proposed in Bi-EB for the first time to detect the pairwise regulation patterns among species and variations in multiple omics on a gene level, such as proteins and mRNA. An expectation-maximization (EM) optimal algorithm is used to extract the foreground co-current variations out of its background noise data by adjusting parameters with bicluster membership probability threshold Ac; and the bicluster average probability p. Three simulation experiments and two real biology mRNA and protein data analyses conducted on the well-known Cancer Genomics Atlas (TCGA) and The Cancer Cell Line Encyclopedia (CCLE) verify that the proposed Bi-EB algorithm can significantly improve the clustering recovery and relevance accuracy, outperforming the other seven biclustering methods-Cheng and Church (CC), xMOTIFs, BiMax, Plaid, Spectral, FABIA, and QUBIC-with a recovery score of 0.98 and a relevance score of 0.99. At the same time, the Bi-EB algorithm is used to determine shared the causality patterns of mRNA to the protein between patients and cancer cells in TCGA and CCLE breast cancer. The clinically well-known treatment target protein module estrogen receptor (ER), ER (p118), AR, BCL2, cyclin E1, and IGFBP2 are identified in accordance with their mRNA expression variations in the luminal-like subtype. Ten genes, including CCNB1, CDH1, KDR, RAB25, PRKCA, etc., found which can maintain the high accordance of mRNA-protein for both breast cancer patients and cell lines in basal-like subtypes for the first time. Bi-EB provides a useful biclustering analysis tool to discover the cross patterns hidden both in multiple data matrixes (omics) and species. The implementation of the Bi-EB method in the clinical setting will have a direct impact on administrating translational research based on the cancer cell screening guidance.
引用
收藏
页数:21
相关论文
共 6 条
  • [1] Simultaneous Integration of Multi-omics Data Improves the Identification of Cancer Driver Modules
    Silverbush, Dana
    Cristea, Simona
    Yanovich-Arad, Gali
    Geiger, Tamar
    Beerenwinkel, Niko
    Sharan, Roded
    CELL SYSTEMS, 2019, 8 (05) : 456 - +
  • [2] A Unified Bayesian Framework for Bi-overlapping-Clustering Multi-omics Data via Sparse Matrix Factorization
    Zhou, Fangting
    He, Kejun
    Cai, James J.
    Davidson, Laurie A.
    Chapkin, Robert S.
    Ni, Yang
    STATISTICS IN BIOSCIENCES, 2023, 15 (03) : 669 - 691
  • [3] Integration of multi-omics data using adaptive graph learning and attention mechanism for patient classification and biomarker identification
    Ouyang, Dong
    Liang, Yong
    Li, Le
    Ai, Ning
    Lu, Shanghui
    Yu, Mingkun
    Liu, Xiaoying
    Xie, Shengli
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 164
  • [4] MORE: a multi-omics data-driven hypergraph integration network for biomedical data classification and biomarker identification
    Wang, Yuhan
    Wang, Zhikang
    Yu, Xuan
    Wang, Xiaoyu
    Song, Jiangning
    Yu, Dong-Jun
    Ge, Fang
    BRIEFINGS IN BIOINFORMATICS, 2024, 26 (01)
  • [5] Identification of ovarian cancer driver genes by using module network integration of multi-omics data
    Gevaert, Olivier
    Villalobos, Victor
    Sikic, Branimir I.
    Plevritis, Sylvia K.
    INTERFACE FOCUS, 2013, 3 (04)
  • [6] Identification of TACSTD2 as novel therapeutic targets for cisplatin-induced acute kidney injury by multi-omics data integration
    Deng, Zebin
    Dong, Zheng
    Wang, Yinhuai
    Dai, Yingbo
    Liu, Jiachen
    Deng, Fei
    HUMAN GENETICS, 2024, 143 (9-10) : 1061 - 1080