Supervised enhancer prediction with epigenetic pattern recognition and targeted validation

被引:59
作者
Sethi, Anurag [1 ]
Gu, Mengting [2 ,3 ]
Gumusgoz, Emrah [4 ]
Chan, Landon [5 ]
Yan, Koon-Kiu [1 ]
Rozowsky, Joel [1 ]
Barozzi, Iros [6 ]
Afzal, Veena [6 ]
Akiyama, Jennifer A. [6 ]
Plajzer-Frick, Ingrid [6 ]
Yan, Chengfei [1 ]
Novak, Catherine S. [6 ]
Kato, Momoe [6 ]
Garvin, Tyler H. [6 ]
Pham, Quan [6 ]
Harrington, Anne [6 ]
Mannion, Brandon J. [6 ]
Lee, Elizabeth A. [6 ]
Fukuda-Yuzawa, Yoko [6 ]
Visel, Axel [6 ]
Dickel, Diane E. [6 ]
Yip, Kevin Y. [7 ]
Sutton, Richard [4 ]
Pennacchio, Len A. [6 ]
Gerstein, Mark [1 ,2 ,3 ]
机构
[1] Yale Univ, Dept Mol Biophys & Biochem, New Haven, CT 06520 USA
[2] Yale Univ, Program Computat Biol & Bioinformat, New Haven, CT 06520 USA
[3] Yale Univ, Dept Comp Sci, POB 2158, New Haven, CT 06520 USA
[4] Yale Univ, Sch Med, Dept Internal Med, Sect Infect Dis, New Haven, CT 06510 USA
[5] Chinese Univ Hong Kong, Sch Med, Hong Kong, Peoples R China
[6] Lawrence Berkeley Natl Lab, Funct Genom Dept, Berkeley, CA USA
[7] Chinese Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
基金
美国国家卫生研究院;
关键词
TRANSCRIPTION FACTOR-BINDING; DNA ELEMENTS; REGULATORY INFORMATION; HISTONE MODIFICATIONS; CHROMATIN SIGNATURES; GENE-EXPRESSION; HUMAN GENOME; DISCOVERY; MOUSE; ENCYCLOPEDIA;
D O I
10.1038/s41592-020-0907-8
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Enhancers are important non-coding elements, but they have traditionally been hard to characterize experimentally. The development of massively parallel assays allows the characterization of large numbers of enhancers for the first time. Here, we developed a framework usingDrosophilaSTARR-seq to create shape-matching filters based on meta-profiles of epigenetic features. We integrated these features with supervised machine-learning algorithms to predict enhancers. We further demonstrated that our model could be transferred to predict enhancers in mammals. We comprehensively validated the predictions using a combination of in vivo and in vitro approaches, involving transgenic assays in mice and transduction-based reporter assays in human cell lines (153 enhancers in total). The results confirmed that our model can accurately predict enhancers in different species without re-parameterization. Finally, we examined the transcription factor binding patterns at predicted enhancers versus promoters. We demonstrated that these patterns enable the construction of a secondary model that effectively distinguishes enhancers and promoters. Supervised machine-learning models trained usingDrosophilaepigenetic and STARR-seq data can be transferred to predict mouse and human enhancers.
引用
收藏
页码:807 / +
页数:29
相关论文
共 60 条
[31]   RIDGE REGRESSION - BIASED ESTIMATION FOR NONORTHOGONAL PROBLEMS [J].
HOERL, AE ;
KENNARD, RW .
TECHNOMETRICS, 1970, 12 (01) :55-&
[32]  
Hoffman MM, 2012, NAT METHODS, V9, P473, DOI [10.1038/nmeth.1937, 10.1038/NMETH.1937]
[33]   DEEP: a general computational framework for predicting enhancers [J].
Kleftogiannis, Dimitrios ;
Kalnis, Panos ;
Bajic, Vladimir B. .
NUCLEIC ACIDS RESEARCH, 2015, 43 (01) :e6
[34]   The landscape of histone modifications across 1% of the human genome in five human cell lines [J].
Koch, Christoph M. ;
Andrews, Robert M. ;
Flicek, Paul ;
Dillon, Shane C. ;
Karaoz, Ulas ;
Clelland, Gayle K. ;
Wilcox, Sarah ;
Beare, David M. ;
Fowler, Joanna C. ;
Couttet, Phillippe ;
James, Keith D. ;
Lefebvre, Gregory C. ;
Bruce, Alexander W. ;
Dovey, Oliver M. ;
Ellis, Peter D. ;
Dhami, Pawandeep ;
Langford, Cordelia F. ;
Weng, Zhiping ;
Birney, Ewan ;
Carter, Nigel P. ;
Vetrie, David ;
Dunham, Ian .
GENOME RESEARCH, 2007, 17 (06) :691-707
[35]  
KOTHARY R, 1989, DEVELOPMENT, V105, P707
[36]  
Kumar V.B., 2005, Correlation Pattern Recognition
[37]   Integrative analysis of 111 reference human epigenomes [J].
Kundaje, Anshul ;
Meuleman, Wouter ;
Ernst, Jason ;
Bilenky, Misha ;
Yen, Angela ;
Heravi-Moussavi, Alireza ;
Kheradpour, Pouya ;
Zhang, Zhizhuo ;
Wang, Jianrong ;
Ziller, Michael J. ;
Amin, Viren ;
Whitaker, John W. ;
Schultz, Matthew D. ;
Ward, Lucas D. ;
Sarkar, Abhishek ;
Quon, Gerald ;
Sandstrom, Richard S. ;
Eaton, Matthew L. ;
Wu, Yi-Chieh ;
Pfenning, Andreas R. ;
Wang, Xinchen ;
Claussnitzer, Melina ;
Liu, Yaping ;
Coarfa, Cristian ;
Harris, R. Alan ;
Shoresh, Noam ;
Epstein, Charles B. ;
Gjoneska, Elizabeta ;
Leung, Danny ;
Xie, Wei ;
Hawkins, R. David ;
Lister, Ryan ;
Hong, Chibo ;
Gascard, Philippe ;
Mungall, Andrew J. ;
Moore, Richard ;
Chuah, Eric ;
Tam, Angela ;
Canfield, Theresa K. ;
Hansen, R. Scott ;
Kaul, Rajinder ;
Sabo, Peter J. ;
Bansal, Mukul S. ;
Carles, Annaick ;
Dixon, Jesse R. ;
Farh, Kai-How ;
Feizi, Soheil ;
Karlic, Rosa ;
Kim, Ah-Ram ;
Kulkarni, Ashwinikumar .
NATURE, 2015, 518 (7539) :317-330
[38]   Unraveling determinants of transcription factor binding outside the core binding site [J].
Levo, Michal ;
Zalckvar, Einat ;
Sharon, Eilon ;
Machado, Ana Carolina Dantas ;
Kalma, Yael ;
Lotam-Pompan, Maya ;
Weinberger, Adina ;
Yakhini, Zohar ;
Rohs, Remo ;
Segal, Eran .
GENOME RESEARCH, 2015, 25 (07) :1018-1029
[39]   Functional assessment of human enhancer activities using whole-genome STARR-sequencing [J].
Liu, Yuwen ;
Yu, Shan ;
Dhiman, Vineet K. ;
Brunetti, Tonya ;
Eckart, Heather ;
White, Kevin P. .
GENOME BIOLOGY, 2017, 18
[40]   DELTA: A Distal Enhancer Locating Tool Based on AdaBoost Algorithm and Shape Features of Chromatin Modifications [J].
Lu, Yiming ;
Qu, Wubin ;
Shan, Guangyu ;
Zhang, Chenggang .
PLOS ONE, 2015, 10 (06)