An algorithmic perspective of de novo cis-regulatory motif finding based on ChIP-seq data

被引:31
作者
Liu, Bingqiang [2 ]
Yang, Jinyu [3 ]
Li, Yang [2 ]
McDermaid, Adam [3 ]
Ma, Qin [1 ,4 ]
机构
[1] South Dakota State Univ, Dept Agron Hort & Plant Sci, Brookings, SD 57007 USA
[2] Shandong Univ, Sch Math, Jinan, Shandong, Peoples R China
[3] South Dakota State Univ, Dept Math & Stat, Brookings, SD 57007 USA
[4] South Dakota State Univ, Bioinformat & Math Biosci Lab, Brookings, SD 57007 USA
基金
美国国家科学基金会;
关键词
cis-regulatory elements; ChIP-seq; motif finding; algorithm; FACTOR-BINDING SITES; TRANSCRIPTION FACTOR; COREGULATED GENES; ACID SEQUENCES; DNA; DISCOVERY; IDENTIFICATION; ELEMENTS; TOOL; PREDICTION;
D O I
10.1093/bib/bbx026
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Transcription factors are proteins that bind to specific DNA sequences and play important roles in controlling the expression levels of their target genes. Hence, prediction of transcription factor binding sites (TFBSs) provides a solid foundation for inferring gene regulatory mechanisms and building regulatory networks for a genome. Chromatin immunoprecipitation sequencing (ChIP-seq) technology can generate large-scale experimental data for such protein-DNA interactions, providing an unprecedented opportunity to identify TFBSs (a.k.a. cis-regulatory motifs). The bottleneck, however, is the lack of robust mathematical models, as well as efficient computational methods for TFBS prediction to make effective use of massive ChIP-seq data sets in the public domain. The purpose of this study is to review existing motif-finding methods for ChIP-seq data from an algorithmic perspective and provide new computational insight into this field. The state-of-the-art methods were shown through summarizing eight representative motif-finding algorithms along with corresponding challenges, and introducing some important relative functions according to specific biological demands, including discriminative motif finding and cofactor motifs analysis. Finally, potential directions and plans for ChIP-seq-based motif-finding tools were showcased in support of future algorithm development.
引用
收藏
页码:1069 / 1081
页数:13
相关论文
共 122 条
[1]   Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning [J].
Alipanahi, Babak ;
Delong, Andrew ;
Weirauch, Matthew T. ;
Frey, Brendan J. .
NATURE BIOTECHNOLOGY, 2015, 33 (08) :831-+
[2]  
[Anonymous], COLD SPRING HARB PRO
[3]  
Bailey T L, 1994, Proc Int Conf Intell Syst Mol Biol, V2, P28
[4]   MEME: discovering and analyzing DNA and protein sequence motifs [J].
Bailey, Timothy L. ;
Williams, Nadya ;
Misleh, Chris ;
Li, Wilfred W. .
NUCLEIC ACIDS RESEARCH, 2006, 34 :W369-W373
[5]   MEME SUITE: tools for motif discovery and searching [J].
Bailey, Timothy L. ;
Boden, Mikael ;
Buske, Fabian A. ;
Frith, Martin ;
Grant, Charles E. ;
Clementi, Luca ;
Ren, Jingyuan ;
Li, Wilfred W. ;
Noble, William S. .
NUCLEIC ACIDS RESEARCH, 2009, 37 :W202-W208
[6]   DREME: motif discovery in transcription factor ChIP-seq data [J].
Bailey, Timothy L. .
BIOINFORMATICS, 2011, 27 (12) :1653-1659
[7]   On the power and limits of evolutionary conservation-unraveling bacterial gene regulatory networks [J].
Baumbach, Jan .
NUCLEIC ACIDS RESEARCH, 2010, 38 (22) :7877-7884
[8]   FootPrinter: a program designed for phylogenetic footprinting [J].
Blanchette, M ;
Tompa, M .
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) :3840-3842
[9]   Algorithms for phylogenetic footprinting [J].
Blanchette, M ;
Schwikowski, B ;
Tompa, M .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2002, 9 (02) :211-223
[10]   Discovery of regulatory elements by a computational method for phylogenetic footprinting [J].
Blanchette, M ;
Tompa, M .
GENOME RESEARCH, 2002, 12 (05) :739-748