ProSampler: an ultrafast and accurate motif finder in large ChIP-seq datasets for combinatory motif discovery

被引:19
作者
Li, Yang [1 ,2 ]
Ni, Pengyu [2 ]
Zhang, Shaoqiang [3 ]
Li, Guojun [1 ,2 ]
Su, Zhengchang [2 ]
机构
[1] Shandong Univ, Sch Math, Jinan 250100, Shandong, Peoples R China
[2] Univ North Carolina Charlotte, Coll Comp & Informat, Dept Bioinformat & Genom, Charlotte, NC 28223 USA
[3] Tianjin Normal Univ, Coll Comp & Informat Engn, Tianjin 300387, Peoples R China
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
TRANSCRIPTION COFACTORS; DATABASE; ALGORITHM; UPDATE;
D O I
10.1093/bioinformatics/btz290
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The availability of numerous ChIP-seq datasets for transcription factors (TF) has provided an unprecedented opportunity to identify all TF binding sites in genomes. However, the progress has been hindered by the lack of a highly efficient and accurate tool to find not only the target motifs, but also cooperative motifs in very big datasets. Results: We herein present an ultrafast and accurate motif-finding algorithm, ProSampler, based on a novel numeration method and Gibbs sampler. ProSampler runs orders of magnitude faster than the fastest existing tools while often more accurately identifying motifs of both the target TFs and cooperators. Thus, ProSampler can greatly facilitate the efforts to identify the entire cis-regulatory code in genomes.
引用
收藏
页码:4632 / 4639
页数:8
相关论文
共 34 条
[1]   Inferring direct DNA binding from ChIP-seq [J].
Bailey, Timothy L. ;
Machanick, Philip .
NUCLEIC ACIDS RESEARCH, 2012, 40 (17) :e128
[2]   DREME: motif discovery in transcription factor ChIP-seq data [J].
Bailey, Timothy L. .
BIOINFORMATICS, 2011, 27 (12) :1653-1659
[3]   MaskerAid:: a performance enhancement to RepeatMasker [J].
Bedell, JA ;
Korf, I ;
Gish, W .
BIOINFORMATICS, 2000, 16 (11) :1040-1041
[4]   Tandem repeats finder: a program to analyze DNA sequences [J].
Benson, G .
NUCLEIC ACIDS RESEARCH, 1999, 27 (02) :573-580
[5]   FastMotif: spectral sequence motif discovery [J].
Colombo, Nicolo ;
Vlassis, Nikos .
BIOINFORMATICS, 2015, 31 (16) :2623-2631
[6]   The Genetics of Transcription Factor DNA Binding Variation [J].
Deplancke, Bart ;
Alpern, Daniel ;
Gardeux, Vincent .
CELL, 2016, 166 (03) :538-554
[7]   Trawler:: de novo regulatory motif discovery pipeline for chromatin immunoprecipitation [J].
Ettwiller, Laurence ;
Paten, Benedict ;
Ramialison, Mirana ;
Birney, Ewan ;
Wittbrodt, Joachim .
NATURE METHODS, 2007, 4 (07) :563-565
[8]   A general approach for discriminative de novo motif discovery from high-throughput data [J].
Grau, Jan ;
Posch, Stefan ;
Grosse, Ivo ;
Keilwagen, Jens .
NUCLEIC ACIDS RESEARCH, 2013, 41 (21) :e197
[9]   Quantifying similarity between motifs [J].
Gupta, Shobhit ;
Stamatoyannopoulos, John A. ;
Bailey, Timothy L. ;
Noble, William Stafford .
GENOME BIOLOGY, 2007, 8 (02)
[10]   P-value-based regulatory motif discovery using positional weight matrices [J].
Hartmann, Holger ;
Guthoehrlein, Eckhart W. ;
Siebert, Matthias ;
Luehr, Sebastian ;
Soeding, Johannes .
GENOME RESEARCH, 2013, 23 (01) :181-194