A review of ensemble methods for de novo motif discovery in ChIP-Seq data

被引:24
作者
Lihu, Andrei [1 ]
Holban, Stefan [2 ]
机构
[1] Politehn Univ Timisoara, Timisoara, Romania
[2] Politehn Univ Timisoara, Comp Sci, Timisoara, Romania
关键词
next-generation sequencing; motif discovery; ensemble methods; ChIP-Seq; transcription factors; CIS-REGULATORY ELEMENTS; FACTOR-BINDING SITES; DNA-SEQUENCE MOTIFS; TRANSCRIPTION FACTOR; CHROMATIN-IMMUNOPRECIPITATION; FINDING ALGORITHM; HUMAN GENOME; MEME-CHIP; WEB TOOL; IDENTIFICATION;
D O I
10.1093/bib/bbv022
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
De novo motif discovery is a difficult computational task. Historically, dedicated algorithms always reported a high percentage of false positives. Their performance did not improve considerably even after they adapted to handle large amounts of chromatin immunoprecipitation sequencing (ChIP-Seq) data. Several studies have advocated aggregating complementary algorithms, combining their predictions to increase the accuracy of the results. This led to the development of ensemble methods. To form a better view on modern ensembles, we review all compound tools designed for ChIP-Seq. After a brief introduction to basic algorithms and early ensembles, we describe the most recent tools. We highlight their limitations and strengths by presenting their architecture, the input options and their output. To provide guidance for next-generation sequencing practitioners, we observe the differences and similarities between them. Last but not least, we identify and recommend several features to be implemented by any novel ensemble algorithm.
引用
收藏
页码:964 / 973
页数:10
相关论文
共 92 条
[61]   TRANSFAC®:: transcriptional regulation, from patterns to profiles [J].
Matys, V ;
Fricke, E ;
Geffers, R ;
Gössling, E ;
Haubrock, M ;
Hehl, R ;
Hornischer, K ;
Karas, D ;
Kel, AE ;
Kel-Margoulis, OV ;
Kloos, DU ;
Land, S ;
Lewicki-Potapov, B ;
Michael, H ;
Münch, R ;
Reuter, I ;
Rotert, S ;
Saxel, H ;
Scheer, M ;
Thiele, S ;
Wingender, E .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :374-378
[62]   UniPROBE: an online database of protein binding microarray data on protein-DNA interactions [J].
Newburger, Daniel E. ;
Bulyk, Martha L. .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D77-D82
[63]   Melina II: a web tool for comparisons among several predictive algorithms to find potential motifs from promoter regions [J].
Okumura, Toshiyuki ;
Makiguchi, Hiroki ;
Makita, Yuko ;
Yamashita, Riu ;
Nakai, Kenta .
NUCLEIC ACIDS RESEARCH, 2007, 35 :W227-W231
[64]   Comparative analysis of methods for representing and searching for transcription factor binding sites [J].
Osada, R ;
Zaslavsky, E ;
Singh, M .
BIOINFORMATICS, 2004, 20 (18) :3516-3525
[65]  
Pavesi G, 2001, Bioinformatics, V17 Suppl 1, pS207
[66]  
Pevzner P., 2000, P 8 INT C INT SYST M, V2000, P269
[67]   Opal web services for biomedical applications [J].
Ren, Jingyuan ;
Williams, Nadya ;
Clementi, Luca ;
Krishnan, Sriram ;
Li, Wilfred W. .
NUCLEIC ACIDS RESEARCH, 2010, 38 :W724-W731
[68]   WebMOTIFS: automated discovery, filtering and scoring of DNA sequence motifs using multiple programs and Bayesian approaches [J].
Romer, Katherine A. ;
Kayombya, Guy-Richard ;
Fraenkel, Ernest .
NUCLEIC ACIDS RESEARCH, 2007, 35 :W217-W220
[69]   JASPAR:: an open-access database for eukaryotic transcription factor binding profiles [J].
Sandelin, A ;
Alkema, W ;
Engström, P ;
Wasserman, WW ;
Lenhard, B .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D91-D94
[70]   THE STRENGTH OF WEAK LEARNABILITY [J].
SCHAPIRE, RE .
MACHINE LEARNING, 1990, 5 (02) :197-227