Uncovering co-regulatory modules and gene regulatory networks in the heart through machine learning-based analysis of large-scale epigenomic data

被引:0
|
作者
Vahab, Naima [1 ,2 ]
Bonu, Tarun [3 ]
Kuhlmann, Levin [3 ]
Ramialison, Mirana [4 ]
Tyagi, Sonika [1 ,2 ]
机构
[1] RMIT Univ, Sch Computat Technol, Melbourne 3000, Australia
[2] Alfred Hosp, Dept Infect Dis, Prahran, Vic 3181, Australia
[3] Monash Univ, Fac Informat Technol, Melbourne 3800, Australia
[4] Murdoch Childrens Res Inst, Murdoch, WA, Australia
关键词
Area under the ROC curve; Cardiac diseases; CNN; CRM; Epigenomics; Gene regulation; Gene regulatory networks; Random forest; Machine learning; MCOT; Receiver operating characteristic; Transcription factor; DNA SHAPE-FEATURES; MOTIF DATABASE; BINDING; TFBSSHAPE; PATHWAY;
D O I
10.1016/j.compbiomed.2024.108068
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The availability of large-scale epigenomic data from various cell types and conditions has yielded valuable insights for evaluating and learning features predicting the co-binding of transcription factors (TF). However, prior attempts to develop models predicting motif co-occurrence lacked scalability for globally analyzing any motif combination or making cross-species predictions. Moreover, mapping co-regulatory modules (CRM) to gene regulatory networks (GRN) is crucial for understanding underlying function. Currently, no comprehensive pipeline exists for large-scale, rapid, and accurate CRM and GRN identification. In this study, we analyzed and evaluated different TF binding characteristics facilitating biologically significant co-binding to identify all potential clusters of co-binding TFs. We curated the UniBind database, containing ChIP-Seq data from over 1983 samples and 232 TFs, and implemented two machine learning models to predict CRMs and the potential regulatory networks they operate on. Two machine learning models, Convolution Neural Networks (CNN) and Random Forest Classifier(RFC), used to predict co-binding between TFs, were compared using precision-recall Receiver Operating Characteristic (ROC) curves. CNN outperformed RFC (AUC 0.94 vs. 0.88) and achieved higher F1 scores (0.938 vs. 0.872). The CRMs generated by the clustering algorithm were validated against ChipAtlas and MCOT, revealing additional motifs forming CRMs. We predicted 200k CRMs for 50k+ human genes, validated against recent CRM prediction methods with 100% overlap. Further, we narrowed our focus to study heart-related regulatory motifs, filtering the generated CRMs to report 1784 Cardiac CRMs containing at least four cardiac TFs. Identified cardiac CRMs revealed potential novel regulators like ARID3A and RXRB for SCAD, including known TFs like PPARG for F11R. Our findings highlight the importance of the NKX family of transcription factors in cardiac development and provide potential targets for further investigation in cardiac disease.
引用
收藏
页数:10
相关论文
共 42 条
  • [21] Machine Learning-Based Seismic Fragility Analysis of Large-Scale Steel Buckling Restrained Brace Frames
    Sun, Baoyin
    Zhang, Yantai
    Huang, Caigui
    CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2020, 125 (02): : 755 - 776
  • [22] Machine learning-based seismic fragility analysis of large-scale steel buckling restrained brace frames
    Sun B.
    Zhang Y.
    Huang C.
    CMES - Computer Modeling in Engineering and Sciences, 2020, 124 (03): : 755 - 776
  • [23] Dynamic Gene Regulatory Network Analysis Using Saccharomyces cerevisiae Large-Scale Time-Course Microarray Data
    Zhang, L.
    Wu, H. C.
    Lin, J. Q.
    Chan, S. C.
    2017 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2017,
  • [24] Large-scale regulatory network analysis from microarray data: modified Bayesian network learning and association rule mining
    Huang, Zan
    Li, Jiexun
    Su, Hua
    Watts, George S.
    Chen, Hsinchun
    DECISION SUPPORT SYSTEMS, 2007, 43 (04) : 1207 - 1225
  • [25] Large-Scale Dynamic Gene Regulatory Networks Analysis for Time Course DNA Microarray Data from C-elegans, Preliminary Results and Findings
    Zhang, L.
    Wu, H. C.
    Chan, S. C.
    2015 IEEE INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2015, : 552 - 556
  • [26] Large-scale mapping of soil particle size distribution using legacy data and machine learning-based pedotransfer functions
    Kassai, Piroska
    Kocsis, Mihaly
    Szatmari, Gabor
    Mako, Andras
    Meszaros, Janos
    Laborczi, Annamaria
    Magyar, Zoltan
    Takacs, Katalin
    Pasztor, Laszlo
    Szabo, Brigitta
    GEODERMA, 2025, 454
  • [27] Large-scale machine learning based on functional networks for biomedical big data with high performance computing platforms
    Elsebakhi, Emad
    Lee, Frank
    Schendel, Eric
    Haque, Anwar
    Kathireason, Nagarajan
    Pathare, Tushar
    Syed, Najeeb
    Al-Ali, Rashid
    JOURNAL OF COMPUTATIONAL SCIENCE, 2015, 11 : 69 - 81
  • [28] Deep learning-based cell-specific gene regulatory networks inferred from single-cell multiome data
    Xu, Junlin
    Lu, Changcheng
    Jin, Shuting
    Meng, Yajie
    Fu, Xiangzheng
    Zeng, Xiangxiang
    Nussinov, Ruth
    Cheng, Feixiong
    NUCLEIC ACIDS RESEARCH, 2025, 53 (05)
  • [29] A time series driven decomposed evolutionary optimization approach for reconstructing large-scale gene regulatory networks based on fuzzy cognitive maps
    Liu, Jing
    Chi, Yaxiong
    Zhu, Chen
    Jin, Yaochu
    BMC BIOINFORMATICS, 2017, 18
  • [30] A time series driven decomposed evolutionary optimization approach for reconstructing large-scale gene regulatory networks based on fuzzy cognitive maps
    Jing Liu
    Yaxiong Chi
    Chen Zhu
    Yaochu Jin
    BMC Bioinformatics, 18