Accurate prediction of functional states of cis-regulatory modules reveals common epigenetic rules in humans and mice

被引:6
作者
Ni, Pengyu [1 ]
Moe, Joshua [1 ]
Su, Zhengchang [1 ]
机构
[1] Univ North Carolina Charlotte, Dept Bioinformat & Genom, Charlotte, NC 28223 USA
基金
美国国家科学基金会;
关键词
cis-regulatory modules; Enhancers; Functional states; Machine-learning; Predictions; CHROMATIN SIGNATURES; TERMINAL DIFFERENTIATION; CELL-TYPES; ENHANCERS; TRANSCRIPTION; ANNOTATION; PROMOTERS; DISCOVERY; PROTEINS; ELEMENTS;
D O I
10.1186/s12915-022-01426-9
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Predicting cis-regulatory modules (CRMs) in a genome and their functional states in various cell/tissue types of the organism are two related challenging computational tasks. Most current methods attempt to simultaneously achieve both using data of multiple epigenetic marks in a cell/tissue type. Though conceptually attractive, they suffer high false discovery rates and limited applications. To fill the gaps, we proposed a two-step strategy to first predict a map of CRMs in the genome, and then predict functional states of all the CRMs in various cell/tissue types of the organism. We have recently developed an algorithm for the first step that was able to more accurately and completely predict CRMs in a genome than existing methods by integrating numerous transcription factor ChIP-seq datasets in the organism. Here, we presented machine-learning methods for the second step. Results: We showed that functional states in a cell/tissue type of all the CRMs in the genome could be accurately predicted using data of only 1 similar to 4 epigenetic marks by a variety of machine-learning classifiers. Our predictions are substantially more accurate than the best achieved so far. Interestingly, a model trained on a cell/tissue type in humans can accurately predict functional states of CRMs in different cell/tissue types of humans as well as of mice, and vice versa. Therefore, epigenetic code that defines functional states of CRMs in various cell/tissue types is universal at least in humans and mice. Moreover, we found that from tens to hundreds of thousands of CRMs were active in a human and mouse cell/tissue type, and up to 99.98% of them were reutilized in different cell/tissue types, while as small as 0.02% of them were unique to a cell/tissue type that might define the cell/tissue type. Conclusions: Our two-step approach can accurately predict functional states in any cell/tissue type of all the CRMs in the genome using data of only 1 similar to 4 epigenetic marks. Our approach is also more cost-effective than existing methods that typically use data of more epigenetic marks. Our results suggest common epigenetic rules for defining functional states of CRMs in various cell/tissue types in humans and mice.
引用
收藏
页数:29
相关论文
共 98 条
  • [1] Annotating non-coding regions of the genome
    Alexander, Roger P.
    Fang, Gang
    Rozowsky, Joel
    Snyder, Michael
    Gerstein, Mark B.
    [J]. NATURE REVIEWS GENETICS, 2010, 11 (08) : 559 - 571
  • [2] A unified architecture of transcriptional regulatory elements
    Andersson, Robin
    Sandelin, Albin
    Danko, Charles G.
    [J]. TRENDS IN GENETICS, 2015, 31 (08) : 426 - 433
  • [3] An atlas of active enhancers across human cell types and tissues
    Andersson, Robin
    Gebhard, Claudia
    Miguel-Escalada, Irene
    Hoof, Ilka
    Bornholdt, Jette
    Boyd, Mette
    Chen, Yun
    Zhao, Xiaobei
    Schmidl, Christian
    Suzuki, Takahiro
    Ntini, Evgenia
    Arner, Erik
    Valen, Eivind
    Li, Kang
    Schwarzfischer, Lucia
    Glatz, Dagmar
    Raithel, Johanna
    Lilje, Berit
    Rapin, Nicolas
    Bagger, Frederik Otzen
    Jorgensen, Mette
    Andersen, Peter Refsing
    Bertin, Nicolas
    Rackham, Owen
    Burroughs, A. Maxwell
    Baillie, J. Kenneth
    Ishizu, Yuri
    Shimizu, Yuri
    Furuhata, Erina
    Maeda, Shiori
    Negishi, Yutaka
    Mungall, Christopher J.
    Meehan, Terrence F.
    Lassmann, Timo
    Itoh, Masayoshi
    Kawaji, Hideya
    Kondo, Naoto
    Kawai, Jun
    Lennartsson, Andreas
    Daub, Carsten O.
    Heutink, Peter
    Hume, David A.
    Jensen, Torben Heick
    Suzuki, Harukazu
    Hayashizaki, Yoshihide
    Mueller, Ferenc
    Forrest, Alistair R. R.
    Carninci, Piero
    Rehli, Michael
    Sandelin, Albin
    [J]. NATURE, 2014, 507 (7493) : 455 - +
  • [4] [Anonymous], 2006, The Regulatory Genome: Gene Regulatory Networks in Development and Evolution
  • [5] Exploiting regulatory heterogeneity to systematically identify enhancers with high accuracy
    Arbel, Hamutal
    Basu, Sumanta
    Fisher, William W.
    Hammonds, Ann S.
    Wan, Kenneth H.
    Park, Soo
    Weiszmann, Richard
    Booth, Benjamin W.
    Keranen, Soile, V
    Henriquez, Clara
    Solari, Omid Shams
    Bickel, Peter J.
    Biggin, Mark D.
    Celniker, Susan E.
    Brown, James B.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2019, 116 (03) : 900 - 908
  • [6] Armstrong JA, 1996, MOL CELL BIOL, V16, P5634
  • [7] Genome-Wide Quantitative Enhancer Activity Maps Identified by STARR-seq
    Arnold, Cosmas D.
    Gerlach, Daniel
    Stelzer, Christoph
    Boryn, Lukasz M.
    Rath, Martina
    Stark, Alexander
    [J]. SCIENCE, 2013, 339 (6123) : 1074 - 1077
  • [8] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [9] STREME: accurate and versatile sequence motif discovery
    Bailey, Timothy L.
    [J]. BIOINFORMATICS, 2021, 37 (18) : 2834 - 2840
  • [10] Functional Dissection of the Enhancer Repertoire in Human Embryonic Stem Cells
    Barakat, Tahsin Stefan
    Halbritter, Florian
    Zhang, Man
    Rendeiro, Andre F.
    Perenthaler, Elena
    Bock, Christoph
    Chambers, Ian
    [J]. CELL STEM CELL, 2018, 23 (02) : 276 - +