Identification and characterization of constrained non-exonic bases lacking predictive epigenomic and transcription factor binding annotations

被引:1
作者
Grujic, Olivera [1 ,2 ]
Phung, Tanya N. [3 ]
Kwon, Soo Bin [2 ,3 ]
Arneson, Adriana [2 ,3 ]
Lee, Yuju [1 ]
Lohmueller, Kirk E. [3 ,4 ,5 ]
Ernst, Jason [1 ,2 ,3 ,6 ,7 ,8 ]
机构
[1] Univ Calif Los Angeles, Comp Sci Dept, Los Angeles, CA 90095 USA
[2] Univ Calif Los Angeles, Dept Biol Chem, Los Angeles, CA 90095 USA
[3] Univ Calif Los Angeles, Interdept Program Bioinformat, Los Angeles, CA 90095 USA
[4] Univ Calif Los Angeles, Dept Ecol & Evolutionary Biol, Los Angeles, CA 90095 USA
[5] Univ Calif Los Angeles, Dept Human Genet, Los Angeles, CA 90095 USA
[6] Univ Calif Los Angeles, Eli & Edythe Broad Ctr Regenerat Med & Stem Cell, Los Angeles, CA 90095 USA
[7] Univ Calif Los Angeles, Jonsson Comprehens Canc Ctr, Los Angeles, CA 90095 USA
[8] Univ Calif Los Angeles, Mol Biol Inst, Los Angeles, CA 90095 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
DNA ELEMENTS; SYSTEMATIC DISCOVERY; NONCODING VARIANTS; ENCYCLOPEDIA; MUTATIONS; FRAMEWORK;
D O I
10.1038/s41467-020-19962-9
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Annotations of evolutionary sequence constraint based on multi-species genome alignments and genome-wide maps of epigenomic marks and transcription factor binding provide important complementary information for understanding the human genome and genetic variation. Here we developed the Constrained Non-Exonic Predictor (CNEP) to quantify the evidence of each base in the genome being in an evolutionarily constrained non-exonic element from an input of over 60,000 epigenomic and transcription factor binding features. We find that the CNEP score outperforms baseline and related existing scores at predicting evolutionarily constrained non-exonic bases from such data. However, a subset of them are still not well predicted by CNEP. We developed a complementary Conservation Signature Score by CNEP (CSS-CNEP) that is predictive of those bases. We further characterize the nature of constrained non-exonic bases with low CNEP scores using additional types of information. CNEP and CSS-CNEP are resources for analyzing constrained non-exonic bases in the genome. Genome-wide maps of evolutionary constraint and large-scale compendia of epigenomic and transcription factor data provide complementary information for genome annotation. Here, the authors develop the Constrained Non-Exonic Predictor (CNEP) that enables better understanding of their relationship.
引用
收藏
页数:16
相关论文
共 49 条
  • [1] Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning
    Alipanahi, Babak
    Delong, Andrew
    Weirauch, Matthew T.
    Frey, Brendan J.
    [J]. NATURE BIOTECHNOLOGY, 2015, 33 (08) : 831 - +
  • [2] A global reference for human genetic variation
    Altshuler, David M.
    Durbin, Richard M.
    Abecasis, Goncalo R.
    Bentley, David R.
    Chakravarti, Aravinda
    Clark, Andrew G.
    Donnelly, Peter
    Eichler, Evan E.
    Flicek, Paul
    Gabriel, Stacey B.
    Gibbs, Richard A.
    Green, Eric D.
    Hurles, Matthew E.
    Knoppers, Bartha M.
    Korbel, Jan O.
    Lander, Eric S.
    Lee, Charles
    Lehrach, Hans
    Mardis, Elaine R.
    Marth, Gabor T.
    McVean, Gil A.
    Nickerson, Deborah A.
    Wang, Jun
    Wilson, Richard K.
    Boerwinkle, Eric
    Doddapaneni, Harsha
    Han, Yi
    Korchina, Viktoriya
    Kovar, Christie
    Lee, Sandra
    Muzny, Donna
    Reid, Jeffrey G.
    Zhu, Yiming
    Chang, Yuqi
    Feng, Qiang
    Fang, Xiaodong
    Guo, Xiaosen
    Jian, Min
    Jiang, Hui
    Jin, Xin
    Lan, Tianming
    Li, Guoqing
    Li, Jingxiang
    Li, Yingrui
    Liu, Shengmao
    Liu, Xiao
    Lu, Yao
    Ma, Xuedi
    Tang, Meifang
    Wang, Bo
    [J]. NATURE, 2015, 526 (7571) : 68 - +
  • [3] Systematic discovery of conservation states for single-nucleotide annotation of the human genome
    Arneson, Adriana
    Ernst, Jason
    [J]. COMMUNICATIONS BIOLOGY, 2019, 2 (1)
  • [4] Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project
    Birney, Ewan
    Stamatoyannopoulos, John A.
    Dutta, Anindya
    Guigo, Roderic
    Gingeras, Thomas R.
    Margulies, Elliott H.
    Weng, Zhiping
    Snyder, Michael
    Dermitzakis, Emmanouil T.
    Stamatoyannopoulos, John A.
    Thurman, Robert E.
    Kuehn, Michael S.
    Taylor, Christopher M.
    Neph, Shane
    Koch, Christoph M.
    Asthana, Saurabh
    Malhotra, Ankit
    Adzhubei, Ivan
    Greenbaum, Jason A.
    Andrews, Robert M.
    Flicek, Paul
    Boyle, Patrick J.
    Cao, Hua
    Carter, Nigel P.
    Clelland, Gayle K.
    Davis, Sean
    Day, Nathan
    Dhami, Pawandeep
    Dillon, Shane C.
    Dorschner, Michael O.
    Fiegler, Heike
    Giresi, Paul G.
    Goldy, Jeff
    Hawrylycz, Michael
    Haydock, Andrew
    Humbert, Richard
    James, Keith D.
    Johnson, Brett E.
    Johnson, Ericka M.
    Frum, Tristan T.
    Rosenzweig, Elizabeth R.
    Karnani, Neerja
    Lee, Kirsten
    Lefebvre, Gregory C.
    Navas, Patrick A.
    Neri, Fidencio
    Parker, Stephen C. J.
    Sabo, Peter J.
    Sandstrom, Richard
    Shafer, Anthony
    [J]. NATURE, 2007, 447 (7146) : 799 - 816
  • [5] Extremely rare variants reveal patterns of germline mutation rate heterogeneity in humans
    Carlson, Jedidiah
    Locke, Adam E.
    Flickinger, Matthew
    Zawistowski, Matthew
    Levy, Shawn
    Myers, Richard M.
    Boehnke, Michael
    Kang, Hyun Min
    Scott, Laura J.
    Li, Jun Z.
    Zollner, Sebastian
    [J]. NATURE COMMUNICATIONS, 2018, 9
  • [6] ReMap 2018: an updated atlas of regulatory regions from an integrative analysis of DNA-binding ChIP-seq experiments
    Cheneby, Jeanne
    Gheorghe, Marius
    Artufel, Marie
    Mathelier, Anthony
    Ballester, Benoit
    [J]. NUCLEIC ACIDS RESEARCH, 2018, 46 (D1) : D267 - D275
  • [7] FTO Obesity Variant Circuitry and Adipocyte Browning in Humans
    Claussnitzer, Melina
    Dankel, Simon N.
    Kim, Kyoung-Han
    Quon, Gerald
    Meuleman, Wouter
    Haugen, Christine
    Glunk, Viktoria
    Sousa, Isabel S.
    Beaudry, Jacqueline L.
    Puviindran, Vijitha
    Abdennur, Nezar A.
    Liu, Jannel
    Svensson, Per-Arne
    Hsu, Yi-Hsiang
    Drucker, Daniel J.
    Mellgren, Gunnar
    Hui, Chi-Chung
    Hauner, Hans
    Kellis, Manolis
    [J]. NEW ENGLAND JOURNAL OF MEDICINE, 2015, 373 (10) : 895 - 907
  • [8] The Encyclopedia of DNA elements (ENCODE): data portal update
    Davis, Carrie A.
    Hitz, Benjamin C.
    Sloan, Cricket A.
    Chan, Esther T.
    Davidson, Jean M.
    Gabdank, Idan
    Hilton, Jason A.
    Jain, Kriti
    Baymuradov, Ulugbek K.
    Narayanan, Aditi K.
    Onate, Kathrina C.
    Graham, Keenan
    Miyasato, Stuart R.
    Dreszer, Timothy R.
    Strattan, J. Seth
    Jolanki, Otto
    Tanaka, Forrest Y.
    Cherry, J. Michael
    [J]. NUCLEIC ACIDS RESEARCH, 2018, 46 (D1) : D794 - D801
  • [9] Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP plus
    Davydov, Eugene V.
    Goode, David L.
    Sirota, Marina
    Cooper, Gregory M.
    Sidow, Arend
    Batzoglou, Serafim
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2010, 6 (12)
  • [10] An integrated encyclopedia of DNA elements in the human genome
    Dunham, Ian
    Kundaje, Anshul
    Aldred, Shelley F.
    Collins, Patrick J.
    Davis, CarrieA.
    Doyle, Francis
    Epstein, Charles B.
    Frietze, Seth
    Harrow, Jennifer
    Kaul, Rajinder
    Khatun, Jainab
    Lajoie, Bryan R.
    Landt, Stephen G.
    Lee, Bum-Kyu
    Pauli, Florencia
    Rosenbloom, Kate R.
    Sabo, Peter
    Safi, Alexias
    Sanyal, Amartya
    Shoresh, Noam
    Simon, Jeremy M.
    Song, Lingyun
    Trinklein, Nathan D.
    Altshuler, Robert C.
    Birney, Ewan
    Brown, James B.
    Cheng, Chao
    Djebali, Sarah
    Dong, Xianjun
    Dunham, Ian
    Ernst, Jason
    Furey, Terrence S.
    Gerstein, Mark
    Giardine, Belinda
    Greven, Melissa
    Hardison, Ross C.
    Harris, Robert S.
    Herrero, Javier
    Hoffman, Michael M.
    Iyer, Sowmya
    Kellis, Manolis
    Khatun, Jainab
    Kheradpour, Pouya
    Kundaje, Anshul
    Lassmann, Timo
    Li, Qunhua
    Lin, Xinying
    Marinov, Georgi K.
    Merkel, Angelika
    Mortazavi, Ali
    [J]. NATURE, 2012, 489 (7414) : 57 - 74