Uncovering co-regulatory modules and gene regulatory networks in the heart through machine learning-based analysis of large-scale epigenomic data

被引:0
|
作者
Vahab, Naima [1 ,2 ]
Bonu, Tarun [3 ]
Kuhlmann, Levin [3 ]
Ramialison, Mirana [4 ]
Tyagi, Sonika [1 ,2 ]
机构
[1] RMIT Univ, Sch Computat Technol, Melbourne 3000, Australia
[2] Alfred Hosp, Dept Infect Dis, Prahran, Vic 3181, Australia
[3] Monash Univ, Fac Informat Technol, Melbourne 3800, Australia
[4] Murdoch Childrens Res Inst, Murdoch, WA, Australia
关键词
Area under the ROC curve; Cardiac diseases; CNN; CRM; Epigenomics; Gene regulation; Gene regulatory networks; Random forest; Machine learning; MCOT; Receiver operating characteristic; Transcription factor; DNA SHAPE-FEATURES; MOTIF DATABASE; BINDING; TFBSSHAPE; PATHWAY;
D O I
10.1016/j.compbiomed.2024.108068
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The availability of large-scale epigenomic data from various cell types and conditions has yielded valuable insights for evaluating and learning features predicting the co-binding of transcription factors (TF). However, prior attempts to develop models predicting motif co-occurrence lacked scalability for globally analyzing any motif combination or making cross-species predictions. Moreover, mapping co-regulatory modules (CRM) to gene regulatory networks (GRN) is crucial for understanding underlying function. Currently, no comprehensive pipeline exists for large-scale, rapid, and accurate CRM and GRN identification. In this study, we analyzed and evaluated different TF binding characteristics facilitating biologically significant co-binding to identify all potential clusters of co-binding TFs. We curated the UniBind database, containing ChIP-Seq data from over 1983 samples and 232 TFs, and implemented two machine learning models to predict CRMs and the potential regulatory networks they operate on. Two machine learning models, Convolution Neural Networks (CNN) and Random Forest Classifier(RFC), used to predict co-binding between TFs, were compared using precision-recall Receiver Operating Characteristic (ROC) curves. CNN outperformed RFC (AUC 0.94 vs. 0.88) and achieved higher F1 scores (0.938 vs. 0.872). The CRMs generated by the clustering algorithm were validated against ChipAtlas and MCOT, revealing additional motifs forming CRMs. We predicted 200k CRMs for 50k+ human genes, validated against recent CRM prediction methods with 100% overlap. Further, we narrowed our focus to study heart-related regulatory motifs, filtering the generated CRMs to report 1784 Cardiac CRMs containing at least four cardiac TFs. Identified cardiac CRMs revealed potential novel regulators like ARID3A and RXRB for SCAD, including known TFs like PPARG for F11R. Our findings highlight the importance of the NKX family of transcription factors in cardiac development and provide potential targets for further investigation in cardiac disease.
引用
收藏
页数:10
相关论文
共 42 条
  • [1] Finding regulatory modules through large-scale gene-expression data analysis
    Kloster, M
    Tang, C
    Wingreen, NS
    BIOINFORMATICS, 2005, 21 (07) : 1172 - 1179
  • [2] Visualization, documentation, analysis, and communication of large-scale gene regulatory networks
    Longabaugh, William J. R.
    Davidson, Eric H.
    Bolouri, Hamid
    BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS, 2009, 1789 (04): : 363 - 374
  • [3] SLIVER: Unveiling large scale gene regulatory networks of single-cell transcriptomic data through causal structure learning and modules aggregation
    Jiang H.
    Wang Y.
    Yin C.
    Pan H.
    Chen L.
    Feng K.
    Chang Y.
    Sun H.
    Computers in Biology and Medicine, 2024, 178
  • [4] EnGRaiN: a supervised ensemble learning method for recovery of large-scale gene regulatory networks
    Aluru, Maneesha
    Shrivastava, Harsh
    Chockalingam, Sriram P.
    Shivakumar, Shruti
    Aluru, Srinivas
    BIOINFORMATICS, 2022, 38 (05) : 1312 - 1319
  • [5] Learning Large-Scale Fuzzy Cognitive Maps Based on Compressed Sensing and Application in Reconstructing Gene Regulatory Networks
    Wu, Kai
    Liu, Jing
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2017, 25 (06) : 1546 - 1560
  • [6] Reconstruction of Large-Scale Gene Regulatory Networks Using Regression-based Models
    Salleh, Faridah Hani Mohamed
    Zainudin, Suhaila
    Raih, Mohd Firdaus
    2018 IEEE CONFERENCE ON BIG DATA AND ANALYTICS (ICBDA), 2018, : 129 - 134
  • [7] iLSGRN: inference of large-scale gene regulatory networks based on multi-model fusion
    Wu, Yiming
    Qian, Bing
    Wang, Anqi
    Dong, Heng
    Zhu, Enqiang
    Ma, Baoshan
    BIOINFORMATICS, 2023, 39 (10)
  • [8] Deep Learning-Based Sentimental Analysis for Large-Scale Imbalanced Twitter Data
    Jamal, Nasir
    Chen, Xianqiao
    Aldabbas, Hamza
    FUTURE INTERNET, 2019, 11 (09)
  • [9] Urothelial cancer gene regulatory networks inferred from large-scale RNAseq, Bead and Oligo gene expression data
    Simoes, Ricardo de Matos
    Dalleau, Sabine
    Williamson, Kate E.
    Emmert-Streib, Frank
    BMC SYSTEMS BIOLOGY, 2015, 9
  • [10] Inferring Large-Scale Gene Regulatory Networks Using a Randomized Algorithm Based on Singular Value Decomposition
    Fan, Anjing
    Wang, Haitao
    Xiang, Hua
    Zou, Xiufen
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2019, 16 (06) : 1997 - 2008