Accurate prediction of functional states of cis-regulatory modules reveals common epigenetic rules in humans and mice

被引:6
作者
Ni, Pengyu [1 ]
Moe, Joshua [1 ]
Su, Zhengchang [1 ]
机构
[1] Univ North Carolina Charlotte, Dept Bioinformat & Genom, Charlotte, NC 28223 USA
基金
美国国家科学基金会;
关键词
cis-regulatory modules; Enhancers; Functional states; Machine-learning; Predictions; CHROMATIN SIGNATURES; TERMINAL DIFFERENTIATION; CELL-TYPES; ENHANCERS; TRANSCRIPTION; ANNOTATION; PROMOTERS; DISCOVERY; PROTEINS; ELEMENTS;
D O I
10.1186/s12915-022-01426-9
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Predicting cis-regulatory modules (CRMs) in a genome and their functional states in various cell/tissue types of the organism are two related challenging computational tasks. Most current methods attempt to simultaneously achieve both using data of multiple epigenetic marks in a cell/tissue type. Though conceptually attractive, they suffer high false discovery rates and limited applications. To fill the gaps, we proposed a two-step strategy to first predict a map of CRMs in the genome, and then predict functional states of all the CRMs in various cell/tissue types of the organism. We have recently developed an algorithm for the first step that was able to more accurately and completely predict CRMs in a genome than existing methods by integrating numerous transcription factor ChIP-seq datasets in the organism. Here, we presented machine-learning methods for the second step. Results: We showed that functional states in a cell/tissue type of all the CRMs in the genome could be accurately predicted using data of only 1 similar to 4 epigenetic marks by a variety of machine-learning classifiers. Our predictions are substantially more accurate than the best achieved so far. Interestingly, a model trained on a cell/tissue type in humans can accurately predict functional states of CRMs in different cell/tissue types of humans as well as of mice, and vice versa. Therefore, epigenetic code that defines functional states of CRMs in various cell/tissue types is universal at least in humans and mice. Moreover, we found that from tens to hundreds of thousands of CRMs were active in a human and mouse cell/tissue type, and up to 99.98% of them were reutilized in different cell/tissue types, while as small as 0.02% of them were unique to a cell/tissue type that might define the cell/tissue type. Conclusions: Our two-step approach can accurately predict functional states in any cell/tissue type of all the CRMs in the genome using data of only 1 similar to 4 epigenetic marks. Our approach is also more cost-effective than existing methods that typically use data of more epigenetic marks. Our results suggest common epigenetic rules for defining functional states of CRMs in various cell/tissue types in humans and mice.
引用
收藏
页数:29
相关论文
共 98 条
  • [51] Kumar B.V. K. Vijaya., 2005, Correlation Pattern Recognition
  • [52] High-throughput functional testing of ENCODE segmentation predictions
    Kwasnieski, Jamie C.
    Fiore, Christopher
    Chaudhari, Hemangi G.
    Cohen, Barak A.
    [J]. GENOME RESEARCH, 2014, 24 (10) : 1595 - 1602
  • [53] Initial sequencing and analysis of the human genome
    Lander, ES
    Int Human Genome Sequencing Consortium
    Linton, LM
    Birren, B
    Nusbaum, C
    Zody, MC
    Baldwin, J
    Devon, K
    Dewar, K
    Doyle, M
    FitzHugh, W
    Funke, R
    Gage, D
    Harris, K
    Heaford, A
    Howland, J
    Kann, L
    Lehoczky, J
    LeVine, R
    McEwan, P
    McKernan, K
    Meldrim, J
    Mesirov, JP
    Miranda, C
    Morris, W
    Naylor, J
    Raymond, C
    Rosetti, M
    Santos, R
    Sheridan, A
    Sougnez, C
    Stange-Thomann, N
    Stojanovic, N
    Subramanian, A
    Wyman, D
    Rogers, J
    Sulston, J
    Ainscough, R
    Beck, S
    Bentley, D
    Burton, J
    Clee, C
    Carter, N
    Coulson, A
    Deadman, R
    Deloukas, P
    Dunham, A
    Dunham, I
    Durbin, R
    French, L
    [J]. NATURE, 2001, 409 (6822) : 860 - 921
  • [54] Transcription regulation and animal diversity
    Levine, M
    Tjian, R
    [J]. NATURE, 2003, 424 (6945) : 147 - 151
  • [55] Looping Back to Leap Forward: Transcription Enters a New Era
    Levine, Michael
    Cattoglio, Claudia
    Tjian, Robert
    [J]. CELL, 2014, 157 (01) : 13 - 25
  • [56] ProSampler: an ultrafast and accurate motif finder in large ChIP-seq datasets for combinatory motif discovery
    Li, Yang
    Ni, Pengyu
    Zhang, Shaoqiang
    Li, Guojun
    Su, Zhengchang
    [J]. BIOINFORMATICS, 2019, 35 (22) : 4632 - 4639
  • [57] A survey of recently emerged genome-wide computational enhancer predictor tools
    Lim, Leonard Whye Kit
    Chung, Hung Hui
    Chong, Yee Ling
    Lee, Nung Kion
    [J]. COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2018, 74 : 132 - 141
  • [58] PEDLA: predicting enhancers with a deep learning-based algorithmic framework
    Liu, Feng
    Li, Hao
    Ren, Chao
    Bo, Xiaochen
    Shu, Wenjie
    [J]. SCIENTIFIC REPORTS, 2016, 6
  • [59] Functional assessment of human enhancer activities using whole-genome STARR-sequencing
    Liu, Yuwen
    Yu, Shan
    Dhiman, Vineet K.
    Brunetti, Tonya
    Eckart, Heather
    White, Kevin P.
    [J]. GENOME BIOLOGY, 2017, 18
  • [60] Identification of H3K4me1-associated proteins at mammalian enhancers
    Local, Andrea
    Huang, Hui
    Albuquerque, Claudio P.
    Singh, Namit
    Lee, Ah Young
    Wang, Wei
    Wang, Chaochen
    Hsia, Judy E.
    Shiau, Andrew K.
    Ge, Kai
    Corbett, Kevin D.
    Wang, Dong
    Zhou, Huilin
    Ren, Bing
    [J]. NATURE GENETICS, 2018, 50 (01) : 73 - +