HOCOMOCO in 2024: a rebuild of the curated collection of binding models for human and mouse transcription factors

被引:21
作者
Vorontsov, Ilya E. [1 ]
Eliseeva, Irina A. [2 ]
Zinkevich, Arsenii [1 ,3 ]
Nikonov, Mikhail [3 ]
Abramov, Sergey [1 ,4 ]
Boytsov, Alexandr [1 ,4 ]
Kamenets, Vasily [1 ,5 ,6 ]
Kasianova, Alexandra [7 ,8 ]
Kolmykov, Semyon [9 ]
Yevshin, Ivan S. [10 ]
Favorov, Alexander [11 ]
Medvedeva, Yulia A. [12 ]
Jolma, Arttu [13 ]
Kolpakov, Fedor [14 ]
Makeev, Vsevolod J. [1 ,5 ,6 ]
Kulakovskiy, Ivan, V [1 ,2 ,15 ]
机构
[1] Russian Acad Sci, Vavilov Inst Gen Genet, Moscow 119991, Russia
[2] Russian Acad Sci, Inst Prot Res, Pushchino 142290, Russia
[3] Lomonosov Moscow State Univ, Fac Bioengn & Bioinformat, Moscow 119991, Russia
[4] Altius Inst Biomed Sci, Seattle, WA 98121 USA
[5] Moscow Inst Phys & Technol, Dolgoprudnyi 141700, Russia
[6] Russian Acad Sci, Ufa Fed Res Ctr, Inst Biochem & Genet, Ufa 450054, Russia
[7] Skolkovo Inst Sci & Technol, Moscow 121205, Russia
[8] Russian Acad Sci, Inst Informat Transmiss Problems, Moscow 127051, Russia
[9] Sirius Univ Sci & Technol, Dept Computat Biol, Sirius 354340, Russia
[10] Biosoft Ru LLC, Novosibirsk 630090, Russia
[11] Johns Hopkins Univ, Sch Med, Baltimore, MD 21205 USA
[12] Russian Acad Sci, Res Ctr Biotechnol RAS, Moscow 119071, Russia
[13] Univ Toronto, Donnelly Ctr, Toronto, ON M5S 3E1, Canada
[14] Fed Res Ctr Informat & Computat Technol, Bioinformat Lab, Novosibirsk 630090, Russia
[15] Kazan Fed Univ, Inst Fundamental Med & Biol, Lab Regulatory Genom, Kazan 420008, Russia
关键词
SITES; IDENTIFICATION; CLASSIFICATION; TFCLASS; MOTIFS;
D O I
10.1093/nar/gkad1077
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We present a major update of the HOCOMOCO collection that provides DNA binding specificity patterns of 949 human transcription factors and 720 mouse orthologs. To make this release, we performed motif discovery in peak sets that originated from 14 183 ChIP-Seq experiments and reads from 2554 HT-SELEX experiments yielding more than 400 thousand candidate motifs. The candidate motifs were annotated according to their similarity to known motifs and the hierarchy of DNA-binding domains of the respective transcription factors. Next, the motifs underwent human expert curation to stratify distinct motif subtypes and remove non-informative patterns and common artifacts. Finally, the curated subset of 100 thousand motifs was supplied to the automated benchmarking to select the best-performing motifs for each transcription factor. The resulting HOCOMOCO v12 core collection contains 1443 verified position weight matrices, including distinct subtypes of DNA binding motifs for particular transcription factors. In addition to the core collection, HOCOMOCO v12 provides motif sets optimized for the recognition of binding sites in vivo and in vitro, and for annotation of regulatory sequence variants. HOCOMOCO is available at https://hocomoco12.autosome.org and https://hocomoco.autosome.org. Graphical Abstract
引用
收藏
页码:D154 / D163
页数:10
相关论文
共 49 条
[1]   Landscape of allele-specific transcription factor binding in the human genome [J].
Abramov, Sergey ;
Boytsov, Alexandr ;
Bykova, Daria ;
Penzar, Dmitry D. ;
Yevshin, Ivan ;
Kolmykov, Semyon K. ;
Fridman, Marina, V ;
Favorov, Alexander, V ;
Vorontsov, Ilya E. ;
Baulin, Eugene ;
Kolpakov, Fedor ;
Makeev, Vsevolod J. ;
Kulakovskiy, Ivan, V .
NATURE COMMUNICATIONS, 2021, 12 (01)
[2]   Promoter Analysis Reveals Globally Differential Regulation of Human Long Non-Coding RNA and Protein-Coding Genes [J].
Alam, Tanvir ;
Medvedeva, Yulia A. ;
Jia, Hui ;
Brown, James B. ;
Lipovich, Leonard ;
Bajic, Vladimir B. .
PLOS ONE, 2014, 9 (10)
[3]   Insights gained from a comprehensive all-against-all transcription factor binding motif benchmarking study [J].
Ambrosini, Giovanna ;
Vorontsov, Ilya ;
Penzar, Dmitry ;
Groux, Romain ;
Fornes, Oriol ;
Nikolaeva, Daria D. ;
Ballester, Benoit ;
Grau, Jan ;
Grosse, Ivo ;
Makeev, Vsevolod ;
Kulakovskiy, Ivan ;
Bucher, Philipp .
GENOME BIOLOGY, 2020, 21 (01)
[4]   Inferring direct DNA binding from ChIP-seq [J].
Bailey, Timothy L. ;
Machanick, Philip .
NUCLEIC ACIDS RESEARCH, 2012, 40 (17) :e128
[5]  
Boytsov Alexandr, 2022, F1000Res, V11, P33, DOI 10.12688/f1000research.75471.1
[6]   ANANASTRA: annotation and enrichment analysis of allele-specific transcription factor binding at SNPs [J].
Boytsov, Alexandr ;
Abramov, Sergey ;
Aiusheeva, Ariuna Z. ;
Kasianova, Alexandra M. ;
Baulin, Eugene ;
Kuznetsov, Ivan A. ;
Aulchenko, Yurii S. ;
Kolmykov, Semyon ;
Yevshin, Ivan ;
Kolpakov, Fedor ;
Vorontsov, Ilya E. ;
Makeev, Vsevolod J. ;
Kulakovskiy, Ivan, V .
NUCLEIC ACIDS RESEARCH, 2022, 50 (W1) :W51-W56
[7]   JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles [J].
Castro-Mondragon, Jaime A. ;
Riudavets-Puig, Rafael ;
Rauluseviciute, Ieva ;
Lemma, Roza Berhanu ;
Turchi, Laura ;
Blanc-Mathieu, Romain ;
Lucas, Jeremy ;
Boddie, Paul ;
Khan, Aziz ;
Perez, Nicolas Manosalva ;
Fornes, Oriol ;
Leung, Tiffany Y. ;
Aguirre, Alejandro ;
Hammal, Fayrouz ;
Schmelter, Daniel ;
Baranasic, Damir ;
Ballester, Benoit ;
Sandelin, Albin ;
Lenhard, Boris ;
Vandepoele, Klaas ;
Wasserman, Wyeth W. ;
Parcy, Francois ;
Mathelier, Anthony .
NUCLEIC ACIDS RESEARCH, 2022, 50 (D1) :D165-D173
[8]   RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections [J].
Castro-Mondragon, Jaime Abraham ;
Jaeger, Sebastien ;
Thieffry, Denis ;
Thomas-Chollier, Morgane ;
van Helden, Jacques .
NUCLEIC ACIDS RESEARCH, 2017, 45 (13)
[9]   gDesigner: computational design of synthetic gRNAs for Cas12a-based transcriptional repression in mammalian cells [J].
Crone, Michael A. ;
MacDonald, James T. ;
Freemont, Paul S. ;
Siciliano, Velia .
NPJ SYSTEMS BIOLOGY AND APPLICATIONS, 2022, 8 (01)
[10]   Enhanced C/EBP binding to G•T mismatches facilitates fixation of CpG mutations in cancer and adult stem cells [J].
Ershova, Anna S. ;
Eliseeva, Irina A. ;
Nikonov, Oleg S. ;
Fedorova, Alla D. ;
Vorontsov, Ilya E. ;
Papatsenko, Dmitry ;
Kulakovskiy, Ivan, V .
CELL REPORTS, 2021, 35 (10)