HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models

被引:155
作者
Kulakovskiy, Ivan V. [1 ,2 ]
Vorontsov, Ilya E. [2 ]
Yevshin, Ivan S. [3 ,4 ]
Soboleva, Anastasiia V. [5 ]
Kasianov, Artem S. [2 ]
Ashoor, Haitham [6 ]
Ba-alawi, Wail [6 ]
Bajic, Vladimir B. [6 ]
Medvedeva, Yulia A. [2 ,7 ]
Kolpakov, Fedor A. [3 ,4 ]
Makeev, Vsevolod J. [1 ,2 ,5 ]
机构
[1] Russian Acad Sci, Engelhardt Inst Mol Biol, GSP 1,Vavilova 32, Moscow 119991, Russia
[2] Russian Acad Sci, Vavilov Inst Gen Genet, GSP 1,Gubkina 3, Moscow 119991, Russia
[3] Russian Acad Sci, Siberian Branch, Design Technol Inst Digital Tech, Academician Rzhanov 6, Novosibirsk 630090, Russia
[4] Inst Syst Biol Ltd, Off 901,Krasina 54, Novosibirsk 630112, Russia
[5] Moscow Inst Phys & Technol, Inst Skiy Per 9, Dolgoprudnyi 141700, Moscow Region, Russia
[6] KAUST, CBRC, Thuwal 239556900, Saudi Arabia
[7] Russian Acad Sci, Ctr Bioengn, 60 Letiya Oktyabrya 7-2, Moscow 117312, Russia
基金
俄罗斯基础研究基金会; 俄罗斯科学基金会;
关键词
CHIP-SEQ DATA; DNA-BINDING; HUMAN GENOME; DATABASE; EXPRESSION; SEQUENCES; ELEMENTS; MOTIFS; MUTATIONS; DISCOVERY;
D O I
10.1093/nar/gkv1249
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Models of transcription factor (TF) binding sites provide a basis for a wide spectrum of studies in regulatory genomics, from reconstruction of regulatory networks to functional annotation of transcripts and sequence variants. While TFs may recognize different sequence patterns in different conditions, it is pragmatic to have a single generic model for each particular TF as a baseline for practical applications. Here we present the expanded and enhanced version of HOCOMOCO (http://hocomoco.autosome.ru and http://www.cbrc.kaust.edu.sa/hocomoco10), the collection of models of DNA patterns, recognized by transcription factors. HOCOMOCO now provides position weight matrix (PWM) models for binding sites of 601 human TFs and, in addition, PWMs for 396 mouse TFs. Furthermore, we introduce the largest up to date collection of dinucleotide PWM models for 86 (52) human (mouse) TFs. The update is based on the analysis of massive ChIP-Seq and HT-SELEX datasets, with the validation of the resulting models on in vivo data. To facilitate a practical application, all HOCOMOCO models are linked to gene and protein databases (Entrez Gene, HGNC, UniProt) and accompanied by precomputed score thresholds. Finally, we provide command-line tools for PWM and diPWM threshold estimation and motif finding in nucleotide sequences.
引用
收藏
页码:D116 / D125
页数:10
相关论文
共 46 条
[1]   Promoter Analysis Reveals Globally Differential Regulation of Human Long Non-Coding RNA and Protein-Coding Genes [J].
Alam, Tanvir ;
Medvedeva, Yulia A. ;
Jia, Hui ;
Brown, James B. ;
Lipovich, Leonard ;
Bajic, Vladimir B. .
PLOS ONE, 2014, 9 (10)
[2]   Tree-Based Position Weight Matrix Approach to Model Transcription Factor Binding Site Profiles [J].
Bi, Yingtao ;
Kim, Hyunsoo ;
Gupta, Ravi ;
Davuluri, Ramana V. .
PLOS ONE, 2011, 6 (09)
[3]   Optimally choosing PWM motif databases and sequence scanning approaches based on ChIP- seq data [J].
Dabrowski, Michal ;
Dojer, Norbert ;
Krystkowiak, Izabella ;
Kaminska, Bozena ;
Wilczynski, Bartek .
BMC BIOINFORMATICS, 2015, 16
[4]   YB-1 (YBX1) does not bind to Y/CCAAT boxes in vivo [J].
Dolfini, D. ;
Mantovani, R. .
ONCOGENE, 2013, 32 (35) :4189-4190
[5]   An integrated encyclopedia of DNA elements in the human genome [J].
Dunham, Ian ;
Kundaje, Anshul ;
Aldred, Shelley F. ;
Collins, Patrick J. ;
Davis, CarrieA. ;
Doyle, Francis ;
Epstein, Charles B. ;
Frietze, Seth ;
Harrow, Jennifer ;
Kaul, Rajinder ;
Khatun, Jainab ;
Lajoie, Bryan R. ;
Landt, Stephen G. ;
Lee, Bum-Kyu ;
Pauli, Florencia ;
Rosenbloom, Kate R. ;
Sabo, Peter ;
Safi, Alexias ;
Sanyal, Amartya ;
Shoresh, Noam ;
Simon, Jeremy M. ;
Song, Lingyun ;
Trinklein, Nathan D. ;
Altshuler, Robert C. ;
Birney, Ewan ;
Brown, James B. ;
Cheng, Chao ;
Djebali, Sarah ;
Dong, Xianjun ;
Dunham, Ian ;
Ernst, Jason ;
Furey, Terrence S. ;
Gerstein, Mark ;
Giardine, Belinda ;
Greven, Melissa ;
Hardison, Ross C. ;
Harris, Robert S. ;
Herrero, Javier ;
Hoffman, Michael M. ;
Iyer, Sowmya ;
Kellis, Manolis ;
Khatun, Jainab ;
Kheradpour, Pouya ;
Kundaje, Anshul ;
Lassmann, Timo ;
Li, Qunhua ;
Lin, Xinying ;
Marinov, Georgi K. ;
Merkel, Angelika ;
Mortazavi, Ali .
NATURE, 2012, 489 (7414) :57-74
[6]   A promoter-level mammalian expression atlas [J].
Forrest, Alistair R. R. ;
Kawaji, Hideya ;
Rehli, Michael ;
Baillie, J. Kenneth ;
de Hoon, Michiel J. L. ;
Haberle, Vanja ;
Lassmann, Timo ;
Kulakovskiy, Ivan V. ;
Lizio, Marina ;
Itoh, Masayoshi ;
Andersson, Robin ;
Mungall, Christopher J. ;
Meehan, Terrence F. ;
Schmeier, Sebastian ;
Bertin, Nicolas ;
Jorgensen, Mette ;
Dimont, Emmanuel ;
Arner, Erik ;
Schmidl, Christian ;
Schaefer, Ulf ;
Medvedeva, Yulia A. ;
Plessy, Charles ;
Vitezic, Morana ;
Severin, Jessica ;
Semple, Colin A. ;
Ishizu, Yuri ;
Young, Robert S. ;
Francescatto, Margherita ;
Alam, Intikhab ;
Albanese, Davide ;
Altschuler, Gabriel M. ;
Arakawa, Takahiro ;
Archer, John A. C. ;
Arner, Peter ;
Babina, Magda ;
Rennie, Sarah ;
Balwierz, Piotr J. ;
Beckhouse, Anthony G. ;
Pradhan-Bhatt, Swati ;
Blake, Judith A. ;
Blumenthal, Antje ;
Bodega, Beatrice ;
Bonetti, Alessandro ;
Briggs, James ;
Brombacher, Frank ;
Burroughs, A. Maxwell ;
Califano, Andrea ;
Cannistraci, Carlo V. ;
Carbajo, Daniel ;
Chen, Yun .
NATURE, 2014, 507 (7493) :462-+
[7]   The Oncogenic EWS-FLI1 Protein Binds In Vivo GGAA Microsatellite Sequences with Potential Transcriptional Activation Function (Publication with Expression of Concern. See vol. 17, 2022) [J].
Guillon, Noelle ;
Tirode, Franck ;
Boeva, Valentina ;
Zynovyev, Andrei ;
Barillot, Emmanuel ;
Delattre, Olivier .
PLOS ONE, 2009, 4 (03)
[8]   Simple Combinations of Lineage-Determining Transcription Factors Prime cis-Regulatory Elements Required for Macrophage and B Cell Identities [J].
Heinz, Sven ;
Benner, Christopher ;
Spann, Nathanael ;
Bertolino, Eric ;
Lin, Yin C. ;
Laslo, Peter ;
Cheng, Jason X. ;
Murre, Cornelis ;
Singh, Harinder ;
Glass, Christopher K. .
MOLECULAR CELL, 2010, 38 (04) :576-589
[9]   DNA-Binding Specificities of Human Transcription Factors [J].
Jolma, Arttu ;
Yan, Jian ;
Whitington, Thomas ;
Toivonen, Jarkko ;
Nitta, Kazuhiro R. ;
Rastas, Pasi ;
Morgunova, Ekaterina ;
Enge, Martin ;
Taipale, Mikko ;
Wei, Gonghong ;
Palin, Kimmo ;
Vaquerizas, Juan M. ;
Vincentelli, Renaud ;
Luscombe, Nicholas M. ;
Hughes, Timothy R. ;
Lemaire, Patrick ;
Ukkonen, Esko ;
Kivioja, Teemu ;
Taipale, Jussi .
CELL, 2013, 152 (1-2) :327-339
[10]   Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data [J].
Jothi, Raja ;
Cuddapah, Suresh ;
Barski, Artem ;
Cui, Kairong ;
Zhao, Keji .
NUCLEIC ACIDS RESEARCH, 2008, 36 (16) :5221-5231