A review of ensemble methods for de novo motif discovery in ChIP-Seq data

被引:24
作者
Lihu, Andrei [1 ]
Holban, Stefan [2 ]
机构
[1] Politehn Univ Timisoara, Timisoara, Romania
[2] Politehn Univ Timisoara, Comp Sci, Timisoara, Romania
关键词
next-generation sequencing; motif discovery; ensemble methods; ChIP-Seq; transcription factors; CIS-REGULATORY ELEMENTS; FACTOR-BINDING SITES; DNA-SEQUENCE MOTIFS; TRANSCRIPTION FACTOR; CHROMATIN-IMMUNOPRECIPITATION; FINDING ALGORITHM; HUMAN GENOME; MEME-CHIP; WEB TOOL; IDENTIFICATION;
D O I
10.1093/bib/bbv022
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
De novo motif discovery is a difficult computational task. Historically, dedicated algorithms always reported a high percentage of false positives. Their performance did not improve considerably even after they adapted to handle large amounts of chromatin immunoprecipitation sequencing (ChIP-Seq) data. Several studies have advocated aggregating complementary algorithms, combining their predictions to increase the accuracy of the results. This led to the development of ensemble methods. To form a better view on modern ensembles, we review all compound tools designed for ChIP-Seq. After a brief introduction to basic algorithms and early ensembles, we describe the most recent tools. We highlight their limitations and strengths by presenting their architecture, the input options and their output. To provide guidance for next-generation sequencing practitioners, we observe the differences and similarities between them. Last but not least, we identify and recommend several features to be implemented by any novel ensemble algorithm.
引用
收藏
页码:964 / 973
页数:10
相关论文
共 92 条
[1]  
Altarawy D, 2009, BIOINF 4 IAPR INT C, P5780
[2]  
[Anonymous], 1986, P NATL ACAD SCI US, V83, P4
[3]   Environmentally induced foregut remodeling by PHA-4/FoxA and DAF-12/NHR [J].
Ao, W ;
Gaudet, J ;
Kent, WJ ;
Muttumu, S ;
Mango, SE .
SCIENCE, 2004, 305 (5691) :1743-1746
[4]  
Bailey T L, 1994, Proc Int Conf Intell Syst Mol Biol, V2, P28
[5]   Practical Guidelines for the Comprehensive Analysis of ChIP-seq Data [J].
Bailey, Timothy ;
Krajewski, Pawel ;
Ladunga, Istvan ;
Lefebvre, Celine ;
Li, Qunhua ;
Liu, Tao ;
Madrigal, Pedro ;
Taslim, Cenny ;
Zhang, Jie .
PLOS COMPUTATIONAL BIOLOGY, 2013, 9 (11)
[6]   Inferring direct DNA binding from ChIP-seq [J].
Bailey, Timothy L. ;
Machanick, Philip .
NUCLEIC ACIDS RESEARCH, 2012, 40 (17) :e128
[7]   MEME SUITE: tools for motif discovery and searching [J].
Bailey, Timothy L. ;
Boden, Mikael ;
Buske, Fabian A. ;
Frith, Martin ;
Grant, Charles E. ;
Clementi, Luca ;
Ren, Jingyuan ;
Li, Wilfred W. ;
Noble, William S. .
NUCLEIC ACIDS RESEARCH, 2009, 37 :W202-W208
[8]   DREME: motif discovery in transcription factor ChIP-seq data [J].
Bailey, Timothy L. .
BIOINFORMATICS, 2011, 27 (12) :1653-1659
[9]   The value of position-specific priors in motif discovery using MEME [J].
Bailey, Timothy L. ;
Boden, Mikael ;
Whitington, Tom ;
Machanick, Philip .
BMC BIOINFORMATICS, 2010, 11
[10]   Bagging predictors [J].
Breiman, L .
MACHINE LEARNING, 1996, 24 (02) :123-140