A general approach for discriminative de novo motif discovery from high-throughput data

被引:32
|
作者
Grau, Jan [1 ]
Posch, Stefan [1 ]
Grosse, Ivo [1 ]
Keilwagen, Jens [2 ,3 ]
机构
[1] Univ Halle Wittenberg, Inst Comp Sci, D-06099 Halle, Saale, Germany
[2] Fed Res Ctr Cultivated Plants, Julius Kuhn Inst, Inst Biosafety Plant Biotechnol, D-06484 Quedlinburg, Germany
[3] Leibniz Inst Plant Genet & Crop Plant Res IPK, Dept Mol Genet, D-06466 Seeland Ot Gatersleben, Germany
关键词
PROTEIN-DNA INTERACTIONS; CHIP-SEQ DATA; FACTOR-BINDING SITES; TRANSCRIPTION FACTOR; POSITIONAL INFORMATION; GENOME; SPECIFICITY; RESOLUTION; SEQUENCES; NETWORK;
D O I
10.1093/nar/gkt831
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
De novo motif discovery has been an important challenge of bioinformatics for the past two decades. Since the emergence of high-throughput techniques like ChIP-seq, ChIP-exo and protein-binding microarrays (PBMs), the focus of de novo motif discovery has shifted to runtime and accuracy on large data sets. For this purpose, specialized algorithms have been designed for discovering motifs in ChIP-seq or PBM data. However, none of the existing approaches work perfectly for all three high-throughput techniques. In this article, we propose Dimont, a general approach for fast and accurate de novo motif discovery from high-throughput data. We demonstrate that Dimont yields a higher number of correct motifs from ChIP-seq data than any of the specialized approaches and achieves a higher accuracy for predicting PBM intensities from probe sequence than any of the approaches specifically designed for that purpose. Dimont also reports the expected motifs for several ChIP-exo data sets. Investigating differences between in vitro and in vivo binding, we find that for most transcription factors, the motifs discovered by Dimont are in good accordance between techniques, but we also find notable exceptions. We also observe that modeling intra-motif dependencies may increase accuracy, which indicates that more complex motif models are a worthwhile field of research.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] W-ChIPMotifs: a web application tool for de novo motif discovery from ChIP-based high-throughput data
    Jin, Victor X.
    Apostolos, Jeff
    Nagisetty, Naga Satya Venkateswara Ra
    Farnham, Peggy J.
    BIOINFORMATICS, 2009, 25 (23) : 3191 - 3193
  • [2] Discriminative motif analysis of high-throughput dataset
    Yao, Zizhen
    MacQuarrie, Kyle L.
    Fong, Abraham P.
    Tapscott, Stephen J.
    Ruzzo, Walter L.
    Gentleman, Robert C.
    BIOINFORMATICS, 2014, 30 (06) : 775 - 783
  • [3] DiNAMO: highly sensitive DNA motif discovery in high-throughput sequencing data
    Saad, Chadi
    Noe, Laurent
    Richard, Hugues
    Leclerc, Julie
    Buisine, Marie-Pierre
    Touzet, Helene
    Figeac, Martin
    BMC BIOINFORMATICS, 2018, 19
  • [4] DiNAMO: highly sensitive DNA motif discovery in high-throughput sequencing data
    Chadi Saad
    Laurent Noé
    Hugues Richard
    Julie Leclerc
    Marie-Pierre Buisine
    Hélène Touzet
    Martin Figeac
    BMC Bioinformatics, 19
  • [5] Joint Variant and De Novo Mutation Identification on Pedigrees from High-Throughput Sequencing Data
    Cleary, John G.
    Braithwaite, Ross
    Gaastra, Kurt
    Hilbush, Brian S.
    Inglis, Stuart
    Irvine, Sean A.
    Jackson, Alan
    Littin, Richard
    Nohzadeh-Malakshah, Sahar
    Rathod, Mehul
    Ware, David
    Trigg, Len
    De La Vega, Francisco M.
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2014, 21 (06) : 405 - 419
  • [6] DiscMLA: An Efficient Discriminative Motif Learning Algorithm over High-Throughput Datasets
    Zhang, Hongbo
    Zhu, Lin
    Huang, De-Shuang
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2018, 15 (06) : 1810 - 1820
  • [7] A probabilistic approach for SNP discovery in high-throughput human resequencing data
    Hoberman, Rose
    Dias, Joana
    Ge, Bing
    Harmsen, Eef
    Mayhew, Michael
    Verlaan, Dominique J.
    Kwan, Tony
    Dewar, Ken
    Blanchette, Mathieu
    Pastinen, Tomi
    GENOME RESEARCH, 2009, 19 (09) : 1542 - 1552
  • [8] DISPOM: A DISCRIMINATIVE DE-NOVO MOTIF DISCOVERY TOOL BASED ON THE JS']JSTACS LIBRARY
    Grau, Jan
    Keilwagen, Jens
    Gohr, Andre
    Paponov, Ivan A.
    Posch, Stefan
    Seifert, Michael
    Strickert, Marc
    Grosse, Ivo
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2013, 11 (01)
  • [9] High-throughput photocapture approach for reaction discovery
    Bayly, Alison A.
    McDonald, Benjamin R.
    Mrksich, Milan
    Scheidt, Karl A.
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2020, 117 (24) : 13261 - 13266
  • [10] Robust high-throughput prokaryote de novo assembly and improvement pipeline for Illumina data
    Page, Andrew J.
    De Silva, Nishadi
    Hunt, Martin
    Quail, Michael A.
    Parkhill, Julian
    Harris, Simon R.
    Otto, Thomas D.
    Keane, Jacqueline A.
    MICROBIAL GENOMICS, 2016, 2 (08): : e000083