Identification and characterization of constrained non-exonic bases lacking predictive epigenomic and transcription factor binding annotations

被引:1
作者
Grujic, Olivera [1 ,2 ]
Phung, Tanya N. [3 ]
Kwon, Soo Bin [2 ,3 ]
Arneson, Adriana [2 ,3 ]
Lee, Yuju [1 ]
Lohmueller, Kirk E. [3 ,4 ,5 ]
Ernst, Jason [1 ,2 ,3 ,6 ,7 ,8 ]
机构
[1] Univ Calif Los Angeles, Comp Sci Dept, Los Angeles, CA 90095 USA
[2] Univ Calif Los Angeles, Dept Biol Chem, Los Angeles, CA 90095 USA
[3] Univ Calif Los Angeles, Interdept Program Bioinformat, Los Angeles, CA 90095 USA
[4] Univ Calif Los Angeles, Dept Ecol & Evolutionary Biol, Los Angeles, CA 90095 USA
[5] Univ Calif Los Angeles, Dept Human Genet, Los Angeles, CA 90095 USA
[6] Univ Calif Los Angeles, Eli & Edythe Broad Ctr Regenerat Med & Stem Cell, Los Angeles, CA 90095 USA
[7] Univ Calif Los Angeles, Jonsson Comprehens Canc Ctr, Los Angeles, CA 90095 USA
[8] Univ Calif Los Angeles, Mol Biol Inst, Los Angeles, CA 90095 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
DNA ELEMENTS; SYSTEMATIC DISCOVERY; NONCODING VARIANTS; ENCYCLOPEDIA; MUTATIONS; FRAMEWORK;
D O I
10.1038/s41467-020-19962-9
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Annotations of evolutionary sequence constraint based on multi-species genome alignments and genome-wide maps of epigenomic marks and transcription factor binding provide important complementary information for understanding the human genome and genetic variation. Here we developed the Constrained Non-Exonic Predictor (CNEP) to quantify the evidence of each base in the genome being in an evolutionarily constrained non-exonic element from an input of over 60,000 epigenomic and transcription factor binding features. We find that the CNEP score outperforms baseline and related existing scores at predicting evolutionarily constrained non-exonic bases from such data. However, a subset of them are still not well predicted by CNEP. We developed a complementary Conservation Signature Score by CNEP (CSS-CNEP) that is predictive of those bases. We further characterize the nature of constrained non-exonic bases with low CNEP scores using additional types of information. CNEP and CSS-CNEP are resources for analyzing constrained non-exonic bases in the genome. Genome-wide maps of evolutionary constraint and large-scale compendia of epigenomic and transcription factor data provide complementary information for genome annotation. Here, the authors develop the Constrained Non-Exonic Predictor (CNEP) that enables better understanding of their relationship.
引用
收藏
页数:16
相关论文
共 49 条
  • [11] STEM: a tool for the analysis of short time series gene expression data
    Ernst, J
    Bar-Joseph, Z
    [J]. BMC BIOINFORMATICS, 2006, 7 (1)
  • [12] Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues
    Ernst, Jason
    Kellis, Manolis
    [J]. NATURE BIOTECHNOLOGY, 2015, 33 (04) : 364 - U74
  • [13] ChromHMM: automating chromatin-state discovery and characterization
    Ernst, Jason
    Kellis, Manolis
    [J]. NATURE METHODS, 2012, 9 (03) : 215 - 216
  • [14] Mapping and analysis of chromatin state dynamics in nine human cell types
    Ernst, Jason
    Kheradpour, Pouya
    Mikkelsen, Tarjei S.
    Shoresh, Noam
    Ward, Lucas D.
    Epstein, Charles B.
    Zhang, Xiaolan
    Wang, Li
    Issner, Robbyn
    Coyne, Michael
    Ku, Manching
    Durham, Timothy
    Kellis, Manolis
    Bernstein, Bradley E.
    [J]. NATURE, 2011, 473 (7345) : 43 - U52
  • [15] Fan RE, 2008, J MACH LEARN RES, V9, P1871
  • [16] Partitioning heritability by functional annotation using genome-wide association summary statistics
    Finucane, Hilary K.
    Bulik-Sullivan, Brendan
    Gusev, Alexander
    Trynka, Gosia
    Reshef, Yakir
    Loh, Po-Ru
    Anttila, Verneri
    Xu, Han
    Zang, Chongzhi
    Farh, Kyle
    Ripke, Stephan
    Day, Felix R.
    Purcell, Shaun
    Stahl, Eli
    Lindstrom, Sara
    Perry, John R. B.
    Okada, Yukinori
    Raychaudhuri, Soumya
    Daly, Mark J.
    Patterson, Nick
    Neale, Benjamin M.
    Price, Alkes L.
    [J]. NATURE GENETICS, 2015, 47 (11) : 1228 - +
  • [17] Identifying novel constrained elements by exploiting biased substitution patterns
    Garber, Manuel
    Guttman, Mitchell
    Clamp, Michele
    Zody, Michael C.
    Friedman, Nir
    Xie, Xiaohui
    [J]. BIOINFORMATICS, 2009, 25 (12) : I54 - I62
  • [18] Integrative analysis of public ChIP-seq experiments reveals a complex multi-cell regulatory landscape
    Griffon, Aurelien
    Barbier, Quentin
    Dalino, Jordi
    van Helden, Jacques
    Spicuglia, Salvatore
    Ballester, Benoit
    [J]. NUCLEIC ACIDS RESEARCH, 2015, 43 (04)
  • [19] An evolutionary framework for measuring epigenomic information and estimating cell-type-specific fitness consequences
    Gulko, Brad
    Siepel, Adam
    [J]. NATURE GENETICS, 2019, 51 (02) : 335 - +
  • [20] A method for calculating probabilities of fitness consequences for point mutations across the human genome
    Gulko, Brad
    Hubisz, Melissa J.
    Gronau, Ilan
    Siepel, Adam
    [J]. NATURE GENETICS, 2015, 47 (03) : 276 - +