A general approach for discriminative de novo motif discovery from high-throughput data

被引:32
|
作者
Grau, Jan [1 ]
Posch, Stefan [1 ]
Grosse, Ivo [1 ]
Keilwagen, Jens [2 ,3 ]
机构
[1] Univ Halle Wittenberg, Inst Comp Sci, D-06099 Halle, Saale, Germany
[2] Fed Res Ctr Cultivated Plants, Julius Kuhn Inst, Inst Biosafety Plant Biotechnol, D-06484 Quedlinburg, Germany
[3] Leibniz Inst Plant Genet & Crop Plant Res IPK, Dept Mol Genet, D-06466 Seeland Ot Gatersleben, Germany
关键词
PROTEIN-DNA INTERACTIONS; CHIP-SEQ DATA; FACTOR-BINDING SITES; TRANSCRIPTION FACTOR; POSITIONAL INFORMATION; GENOME; SPECIFICITY; RESOLUTION; SEQUENCES; NETWORK;
D O I
10.1093/nar/gkt831
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
De novo motif discovery has been an important challenge of bioinformatics for the past two decades. Since the emergence of high-throughput techniques like ChIP-seq, ChIP-exo and protein-binding microarrays (PBMs), the focus of de novo motif discovery has shifted to runtime and accuracy on large data sets. For this purpose, specialized algorithms have been designed for discovering motifs in ChIP-seq or PBM data. However, none of the existing approaches work perfectly for all three high-throughput techniques. In this article, we propose Dimont, a general approach for fast and accurate de novo motif discovery from high-throughput data. We demonstrate that Dimont yields a higher number of correct motifs from ChIP-seq data than any of the specialized approaches and achieves a higher accuracy for predicting PBM intensities from probe sequence than any of the approaches specifically designed for that purpose. Dimont also reports the expected motifs for several ChIP-exo data sets. Investigating differences between in vitro and in vivo binding, we find that for most transcription factors, the motifs discovered by Dimont are in good accordance between techniques, but we also find notable exceptions. We also observe that modeling intra-motif dependencies may increase accuracy, which indicates that more complex motif models are a worthwhile field of research.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Discovery of a Novel General Anesthetic Chemotype Using High-throughput Screening
    McKinstry-Wu, Andrew R.
    Bu, Weiming
    Rai, Ganesha
    Lea, Wendy A.
    Weiser, Brian P.
    Liang, David F.
    Simeonov, Anton
    Jadhav, Ajit
    Maloney, David J.
    Eckenhoff, Roderic G.
    ANESTHESIOLOGY, 2015, 122 (02) : 325 - 333
  • [32] Data processing for high-throughput mass spectrometry in drug discovery
    Liu, Chang
    Zhang, Hui
    EXPERT OPINION ON DRUG DISCOVERY, 2024, 19 (07) : 815 - 825
  • [33] A review of ensemble methods for de novo motif discovery in ChIP-Seq data
    Lihu, Andrei
    Holban, Stefan
    BRIEFINGS IN BIOINFORMATICS, 2015, 16 (06) : 964 - 973
  • [34] A high-throughput de novo sequencing approach for shotgun proteomics using high-resolution tandem mass spectrometry
    Chongle Pan
    Byung H Park
    William H McDonald
    Patricia A Carey
    Jillian F Banfield
    Nathan C VerBerkmoes
    Robert L Hettich
    Nagiza F Samatova
    BMC Bioinformatics, 11
  • [35] A high-throughput de novo sequencing approach for shotgun proteomics using high-resolution tandem mass spectrometry
    Pan, Chongle
    Park, Byung H.
    McDonald, William H.
    Carey, Patricia A.
    Banfield, Jillian F.
    VerBerkmoes, Nathan C.
    Hettich, Robert L.
    Samatova, Nagiza F.
    BMC BIOINFORMATICS, 2010, 11
  • [36] BayesMotif: de novo protein sorting motif discovery from impure datasets
    Hu, Jianjun
    Zhang, Fan
    BMC BIOINFORMATICS, 2010, 11
  • [37] BayesMotif: de novo protein sorting motif discovery from impure datasets
    Jianjun Hu
    Fan Zhang
    BMC Bioinformatics, 11
  • [38] Greedy de novo motif discovery to construct motif repositories for bacterial proteomes
    Khakzad, Hamed
    Malmstrom, Johan
    Malmstrom, Lars
    BMC BIOINFORMATICS, 2019, 20 (Suppl 4)
  • [39] Greedy de novo motif discovery to construct motif repositories for bacterial proteomes
    Hamed Khakzad
    Johan Malmström
    Lars Malmström
    BMC Bioinformatics, 20
  • [40] ExpoSeq: simplified analysis of high-throughput sequencing data from antibody discovery campaigns
    Sorensen, Christoffer, V
    Hofmann, Nils
    Rawat, Puneet
    Sorensen, Frederik, V
    Ljungars, Anne
    Greiff, Victor
    Laustsen, Andreas H.
    Jenkins, Timothy P.
    BIOINFORMATICS ADVANCES, 2024, 4 (01):