Supervised enhancer prediction with epigenetic pattern recognition and targeted validation

被引:59
作者
Sethi, Anurag [1 ]
Gu, Mengting [2 ,3 ]
Gumusgoz, Emrah [4 ]
Chan, Landon [5 ]
Yan, Koon-Kiu [1 ]
Rozowsky, Joel [1 ]
Barozzi, Iros [6 ]
Afzal, Veena [6 ]
Akiyama, Jennifer A. [6 ]
Plajzer-Frick, Ingrid [6 ]
Yan, Chengfei [1 ]
Novak, Catherine S. [6 ]
Kato, Momoe [6 ]
Garvin, Tyler H. [6 ]
Pham, Quan [6 ]
Harrington, Anne [6 ]
Mannion, Brandon J. [6 ]
Lee, Elizabeth A. [6 ]
Fukuda-Yuzawa, Yoko [6 ]
Visel, Axel [6 ]
Dickel, Diane E. [6 ]
Yip, Kevin Y. [7 ]
Sutton, Richard [4 ]
Pennacchio, Len A. [6 ]
Gerstein, Mark [1 ,2 ,3 ]
机构
[1] Yale Univ, Dept Mol Biophys & Biochem, New Haven, CT 06520 USA
[2] Yale Univ, Program Computat Biol & Bioinformat, New Haven, CT 06520 USA
[3] Yale Univ, Dept Comp Sci, POB 2158, New Haven, CT 06520 USA
[4] Yale Univ, Sch Med, Dept Internal Med, Sect Infect Dis, New Haven, CT 06510 USA
[5] Chinese Univ Hong Kong, Sch Med, Hong Kong, Peoples R China
[6] Lawrence Berkeley Natl Lab, Funct Genom Dept, Berkeley, CA USA
[7] Chinese Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
基金
美国国家卫生研究院;
关键词
TRANSCRIPTION FACTOR-BINDING; DNA ELEMENTS; REGULATORY INFORMATION; HISTONE MODIFICATIONS; CHROMATIN SIGNATURES; GENE-EXPRESSION; HUMAN GENOME; DISCOVERY; MOUSE; ENCYCLOPEDIA;
D O I
10.1038/s41592-020-0907-8
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Enhancers are important non-coding elements, but they have traditionally been hard to characterize experimentally. The development of massively parallel assays allows the characterization of large numbers of enhancers for the first time. Here, we developed a framework usingDrosophilaSTARR-seq to create shape-matching filters based on meta-profiles of epigenetic features. We integrated these features with supervised machine-learning algorithms to predict enhancers. We further demonstrated that our model could be transferred to predict enhancers in mammals. We comprehensively validated the predictions using a combination of in vivo and in vitro approaches, involving transgenic assays in mice and transduction-based reporter assays in human cell lines (153 enhancers in total). The results confirmed that our model can accurately predict enhancers in different species without re-parameterization. Finally, we examined the transcription factor binding patterns at predicted enhancers versus promoters. We demonstrated that these patterns enable the construction of a secondary model that effectively distinguishes enhancers and promoters. Supervised machine-learning models trained usingDrosophilaepigenetic and STARR-seq data can be transferred to predict mouse and human enhancers.
引用
收藏
页码:807 / +
页数:29
相关论文
共 60 条
[1]   An atlas of active enhancers across human cell types and tissues [J].
Andersson, Robin ;
Gebhard, Claudia ;
Miguel-Escalada, Irene ;
Hoof, Ilka ;
Bornholdt, Jette ;
Boyd, Mette ;
Chen, Yun ;
Zhao, Xiaobei ;
Schmidl, Christian ;
Suzuki, Takahiro ;
Ntini, Evgenia ;
Arner, Erik ;
Valen, Eivind ;
Li, Kang ;
Schwarzfischer, Lucia ;
Glatz, Dagmar ;
Raithel, Johanna ;
Lilje, Berit ;
Rapin, Nicolas ;
Bagger, Frederik Otzen ;
Jorgensen, Mette ;
Andersen, Peter Refsing ;
Bertin, Nicolas ;
Rackham, Owen ;
Burroughs, A. Maxwell ;
Baillie, J. Kenneth ;
Ishizu, Yuri ;
Shimizu, Yuri ;
Furuhata, Erina ;
Maeda, Shiori ;
Negishi, Yutaka ;
Mungall, Christopher J. ;
Meehan, Terrence F. ;
Lassmann, Timo ;
Itoh, Masayoshi ;
Kawaji, Hideya ;
Kondo, Naoto ;
Kawai, Jun ;
Lennartsson, Andreas ;
Daub, Carsten O. ;
Heutink, Peter ;
Hume, David A. ;
Jensen, Torben Heick ;
Suzuki, Harukazu ;
Hayashizaki, Yoshihide ;
Mueller, Ferenc ;
Forrest, Alistair R. R. ;
Carninci, Piero ;
Rehli, Michael ;
Sandelin, Albin .
NATURE, 2014, 507 (7493) :455-+
[2]  
[Anonymous], 2003, ARTIFICIAL INTELLIGE
[3]   Transcribed enhancers lead waves of coordinated transcription in transitioning mammalian cells [J].
Arner, Erik ;
Daub, Carsten O. ;
Vitting-Seerup, Kristoffer ;
Andersson, Robin ;
Lilje, Berit ;
Drablos, Finn ;
Lennartsson, Andreas ;
Roennerblad, Michelle ;
Hrydziuszko, Olga ;
Vitezic, Morana ;
Freeman, Tom C. ;
Alhendi, Ahmad M. N. ;
Arner, Peter ;
Axton, Richard ;
Baillie, J. Kenneth ;
Beckhouse, Anthony ;
Bodega, Beatrice ;
Briggs, James ;
Brombacher, Frank ;
Davis, Margaret ;
Detmar, Michael ;
Ehrlund, Anna ;
Endoh, Mitsuhiro ;
Eslami, Afsaneh ;
Fagiolini, Michela ;
Fairbairn, Lynsey ;
Faulkner, Geoffrey J. ;
Ferrai, Carmelo ;
Fisher, Malcolm E. ;
Forrester, Lesley ;
Goldowitz, Daniel ;
Guler, Reto ;
Ha, Thomas ;
Hara, Mitsuko ;
Herlyn, Meenhard ;
Ikawa, Tomokatsu ;
Kai, Chieko ;
Kawamoto, Hiroshi ;
Khachigian, Levon M. ;
Klinken, S. Peter ;
Kojima, Soichi ;
Koseki, Haruhiko ;
Klein, Sarah ;
Mejhert, Niklas ;
Miyaguchi, Ken ;
Mizuno, Yosuke ;
Morimoto, Mitsuru ;
Morris, Kelly J. ;
Mummery, Christine ;
Nakachi, Yutaka .
SCIENCE, 2015, 347 (6225) :1010-1014
[4]   Genome-Wide Quantitative Enhancer Activity Maps Identified by STARR-seq [J].
Arnold, Cosmas D. ;
Gerlach, Daniel ;
Stelzer, Christoph ;
Boryn, Lukasz M. ;
Rath, Martina ;
Stark, Alexander .
SCIENCE, 2013, 339 (6123) :1074-1077
[5]   ZNF143 provides sequence specificity to secure chromatin interactions at gene promoters [J].
Bailey, Swneke D. ;
Zhang, Xiaoyang ;
Desai, Kinjal ;
Aid, Malika ;
Corradin, Olivia ;
Iari, Richard Cowper-Sal ;
Akhtar-Zaidi, Batool ;
Scacheri, Peter C. ;
Haibe-Kains, Benjamin ;
Lupien, Mathieu .
NATURE COMMUNICATIONS, 2015, 6
[6]   EXPRESSION OF A BETA-GLOBIN GENE IS ENHANCED BY REMOTE SV40 DNA-SEQUENCES [J].
BANERJI, J ;
RUSCONI, S ;
SCHAFFNER, W .
CELL, 1981, 27 (02) :299-308
[7]   Statistical performance of support vector machines [J].
Blanchard, Gilles ;
Bousquet, Olivier ;
Massart, Pascal .
ANNALS OF STATISTICS, 2008, 36 (02) :489-531
[8]   Comparative analysis of regulatory information and circuits across distant species [J].
Boyle, Alan P. ;
Araya, Carlos L. ;
Brdlik, Cathleen ;
Cayting, Philip ;
Cheng, Chao ;
Cheng, Yong ;
Gardner, Kathryn ;
Hillier, LaDeana W. ;
Janette, Judith ;
Jiang, Lixia ;
Kasper, Dionna ;
Kawli, Trupti ;
Kheradpour, Pouya ;
Kundaje, Anshul ;
Li, Jingyi Jessica ;
Ma, Lijia ;
Niu, Wei ;
Rehm, E. Jay ;
Rozowsky, Joel ;
Slattery, Matthew ;
Spokony, Rebecca ;
Terrell, Robert ;
Vafeados, Dionne ;
Wang, Daifeng ;
Weisdepp, Peter ;
Wu, Yi-Chieh ;
Xie, Dan ;
Yan, Koon-Kiu ;
Feingold, Elise A. ;
Good, Peter J. ;
Pazin, Michael J. ;
Huang, Haiyan ;
Bickel, Peter J. ;
Brenner, Steven E. ;
Reinke, Valerie ;
Waterston, Robert H. ;
Gerstein, Mark ;
White, Kevin P. ;
Kellis, Manolis ;
Snyder, Michael .
NATURE, 2014, 512 (7515) :453-+
[9]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[10]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167