STREME: accurate and versatile sequence motif discovery

被引:290
作者
Bailey, Timothy L. [1 ]
机构
[1] Univ Nevada, Dept Pharmacol, Reno, NV 89557 USA
基金
美国国家卫生研究院;
关键词
BINDING SITES;
D O I
10.1093/bioinformatics/btab203
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Sequence motif discovery algorithms can identify novel sequence patterns that perform biological functions in DNA, RNA and protein sequences-for example, the binding site motifs of DNA- and RNA-binding proteins. Results: The STREME algorithm presented here advances the state-of-the-art in ab initio motif discovery in terms of both accuracy and versatility. Using in vivo DNA (ChIP-seq) and RNA (CLIP-seq) data, and validating motifs with reference motifs derived from in vitro data, we show that STREME is more accurate, sensitive and thorough than several widely used algorithms (DREME, HOMER, MEME, Peak-motifs) and two other representative algorithms (ProSampler and Weeder). STREME's capabilities include the ability to find motifs in datasets with hundreds of thousands of sequences, to find both short and long motifs (from 3 to 30 positions), to perform differential motif discovery in pairs of sequence datasets, and to find motifs in sequences over virtually any alphabet (DNA, RNA, protein and user-defined alphabets). Unlike most motif discovery algorithms, STREME reports a useful estimate of the statistical significance of each motif it discovers. STREME is easy to use individually via its web server or via the command line, and is completely integrated with the widely used MEME Suite of sequence analysis tools. The name STREME stands for 'Simple, Thorough, Rapid, Enriched Motif Elicitation'.
引用
收藏
页码:2834 / 2840
页数:7
相关论文
共 21 条
  • [1] Bailey T L, 1995, Proc Int Conf Intell Syst Mol Biol, V3, P21
  • [2] DREME: motif discovery in transcription factor ChIP-seq data
    Bailey, Timothy L.
    [J]. BIOINFORMATICS, 2011, 27 (12) : 1653 - 1659
  • [3] C2H2 Zinc Finger Proteins: The Largest but Poorly Explored Family of Higher Eukaryotic Transcription Factors
    Fedotova, A. A.
    Bonchuk, A. N.
    Mogila, V. A.
    Georgiev, P. G.
    [J]. ACTA NATURAE, 2017, 9 (02): : 47 - 58
  • [4] On the interpretation of x(2) from contingency tables, and the calculation of P
    Fisher, RA
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY, 1922, 85 : 87 - 94
  • [5] Quantifying similarity between motifs
    Gupta, Shobhit
    Stamatoyannopoulos, John A.
    Bailey, Timothy L.
    Noble, William Stafford
    [J]. GENOME BIOLOGY, 2007, 8 (02)
  • [6] Simple Combinations of Lineage-Determining Transcription Factors Prime cis-Regulatory Elements Required for Macrophage and B Cell Identities
    Heinz, Sven
    Benner, Christopher
    Spann, Nathanael
    Bertolino, Eric
    Lin, Yin C.
    Laslo, Peter
    Cheng, Jason X.
    Murre, Cornelis
    Singh, Harinder
    Glass, Christopher K.
    [J]. MOLECULAR CELL, 2010, 38 (04) : 576 - 589
  • [7] DNA-Binding Specificities of Human Transcription Factors
    Jolma, Arttu
    Yan, Jian
    Whitington, Thomas
    Toivonen, Jarkko
    Nitta, Kazuhiro R.
    Rastas, Pasi
    Morgunova, Ekaterina
    Enge, Martin
    Taipale, Mikko
    Wei, Gonghong
    Palin, Kimmo
    Vaquerizas, Juan M.
    Vincentelli, Renaud
    Luscombe, Nicholas M.
    Hughes, Timothy R.
    Lemaire, Patrick
    Ukkonen, Esko
    Kivioja, Teemu
    Taipale, Jussi
    [J]. CELL, 2013, 152 (1-2) : 327 - 339
  • [8] Versatile and open software for comparing large genomes
    Kurtz, S
    Phillippy, A
    Delcher, AL
    Smoot, M
    Shumway, M
    Antonescu, C
    Salzberg, SL
    [J]. GENOME BIOLOGY, 2004, 5 (02)
  • [9] ProSampler: an ultrafast and accurate motif finder in large ChIP-seq datasets for combinatory motif discovery
    Li, Yang
    Ni, Pengyu
    Zhang, Shaoqiang
    Li, Guojun
    Su, Zhengchang
    [J]. BIOINFORMATICS, 2019, 35 (22) : 4632 - 4639
  • [10] MEME-ChIP: motif analysis of large DNA datasets
    Machanick, Philip
    Bailey, Timothy L.
    [J]. BIOINFORMATICS, 2011, 27 (12) : 1696 - 1697