Combining Sequence and Epigenomic Data to Predict Transcription Factor Binding Sites Using Deep Learning

被引:1
|
作者
Jing, Fang [1 ]
Zhang, Shao-Wu [1 ]
Cao, Zhen [2 ]
Zhang, Shihua l [2 ,3 ]
机构
[1] Northwestern Polytech Univ, Coll Automat, Minist Educ, Key Lab Informat Fusion Technol, Xian 710072, Shaanxi, Peoples R China
[2] Chinese Acad Sci, Acad Math & Syst Sci, NCMIS, CEMS,RCSDS, Beijing 100190, Peoples R China
[3] Univ Chinese Acad Sci, Sch Math Sci, Beijing 100049, Peoples R China
来源
BIOINFORMATICS RESEARCH AND APPLICATIONS, ISBRA 2018 | 2018年 / 10847卷
基金
中国国家自然科学基金;
关键词
Bioinformatics; Machine learning; Transcription factors binding sites; Convolutional neural networks; DNA accessibility; Histone modification; CHROMATIN ACCESSIBILITY PREDICTION; NETWORKS;
D O I
10.1007/978-3-319-94968-0_23
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Knowing the transcription factor binding sites (TFBSs) is essential for modeling the underlying binding mechanisms and follow-up cellular functions. Convolutional neural networks (CNNs) have outperformed methods in predicting TFBSs from the primary DNA sequence. In addition to DNA sequences, histone modifications and chromatin accessibility are also important factors influencing their activity. They have been explored to predict TFBSs recently. However, current methods rarely take into account histone modifications and chromatin accessibility using CNN in an integrative framework. To this end, we developed a general CNN model to integrate these data for predicting TFBSs. We systematically benchmarked a series of architecture variants by changing network structure in terms of width and depth, and explored the effects of sample length at flanking regions. We evaluated the performance of the three types of data and their combinations using 256 ChIP-seq experiments and also compared it with competing machine learning methods. We find that contributions from these three types of data are complementary to each other. Moreover, the integrative CNN framework is superior to traditional machine learning methods with significant improvements.
引用
收藏
页码:241 / 252
页数:12
相关论文
共 50 条
  • [41] bPeaks: a bioinformatics tool to detect transcription factor binding sites from ChIPseq data in yeasts and other organisms with small genomes
    Merhej, Jawad
    Frigo, Amandine
    Le Crom, Stephane
    Camadro, Jean-Michel
    Devaux, Frederic
    Lelandais, Gaelle
    YEAST, 2014, 31 (10) : 375 - 391
  • [42] Occupancy Classification of Position Weight Matrix-Inferred Transcription Factor Binding Sites
    Wright, Hollis
    Cohen, Aaron
    Soenmez, Kemal
    Yochum, Gregory
    McWeeney, Shannon
    PLOS ONE, 2011, 6 (11):
  • [43] Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility
    Liu, Sheng
    Zibetti, Cristina
    Wan, Jun
    Wang, Guohua
    Blackshaw, Seth
    Qian, Jiang
    BMC BIOINFORMATICS, 2017, 18
  • [44] Identifying Functional Transcription Factor Binding Sites in Yeast by Considering Their Positional Preference in the Promoters
    Lai, Fu-Jou
    Chiu, Chia-Chun
    Yang, Tzu-Hsien
    Huang, Yueh-Min
    Wu, Wei-Sheng
    PLOS ONE, 2013, 8 (12):
  • [45] Drug Resistance Prediction Using Deep Learning Techniques on HIV-1 Sequence Data
    Steiner, Margaret C.
    Gibson, Keylie M.
    Crandall, Keith A.
    VIRUSES-BASEL, 2020, 12 (05):
  • [46] Deep Learning Encoding for Rapid Sequence Identification on Microbiome Data
    Borgman, Jacob
    Stark, Karen
    Carson, Jeremy
    Hauser, Loren
    FRONTIERS IN BIOINFORMATICS, 2022, 2
  • [47] G = MAT: Linking Transcription Factor Expression and DNA Binding Data
    Tretyakov, Konstantin
    Laur, Sven
    Vilo, Jaak
    PLOS ONE, 2011, 6 (01):
  • [48] Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility
    Sheng Liu
    Cristina Zibetti
    Jun Wan
    Guohua Wang
    Seth Blackshaw
    Jiang Qian
    BMC Bioinformatics, 18
  • [49] Combining survey and census data for improved poverty prediction using semi-supervised deep learning
    Echevin, Damien
    Fotso, Guy
    Bouroubi, Yacine
    Coulombe, Harold
    Li, Qing
    JOURNAL OF DEVELOPMENT ECONOMICS, 2025, 172
  • [50] Prediction of zinc binding sites in proteins using sequence derived information
    Srivastava, Abhishikha
    Kumar, Manish
    JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 2018, 36 (16) : 4413 - 4423