LASAGNA: A novel algorithm for transcription factor binding site alignment

被引:23
作者
Lee, Chih [1 ]
Huang, Chun-Hsi [1 ]
机构
[1] Univ Connecticut, Dept Comp Sci & Engn, Storrs, CT 06269 USA
来源
BMC BIOINFORMATICS | 2013年 / 14卷
基金
美国国家科学基金会;
关键词
GENE-REGULATION; CLUSTAL-W; DATABASE; DNA; PROMOTER; MOTIFS; TOOL; SEQUENCES; PATTERNS; ELEMENTS;
D O I
10.1186/1471-2105-14-108
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Scientists routinely scan DNA sequences for transcription factor (TF) binding sites (TFBSs). Most of the available tools rely on position-specific scoring matrices (PSSMs) constructed from aligned binding sites. Because of the resolutions of assays used to obtain TFBSs, databases such as TRANSFAC, ORegAnno and PAZAR store unaligned variable-length DNA segments containing binding sites of a TF. These DNA segments need to be aligned to build a PSSM. While the TRANSFAC database provides scoring matrices for TFs, nearly 78% of the TFs in the public release do not have matrices available. As work on TFBS alignment algorithms has been limited, it is highly desirable to have an alignment algorithm tailored to TFBSs. Results: We designed a novel algorithm named LASAGNA, which is aware of the lengths of input TFBSs and utilizes position dependence. Results on 189 TFs of 5 species in the TRANSFAC database showed that our method significantly outperformed ClustalW2 and MEME. We further compared a PSSM method dependent on LASAGNA to an alignment-free TFBS search method. Results on 89 TFs whose binding sites can be located in genomes showed that our method is significantly more precise at fixed recall rates. Finally, we described LASAGNA-ChIP, a more sophisticated version for ChIP (Chromatin immunoprecipitation) experiments. Under the one-per-sequence model, it showed comparable performance with MEME in discovering motifs in ChIP-seq peak sequences. Conclusions: We conclude that the LASAGNA algorithm is simple and effective in aligning variable-length binding sites. It has been integrated into a user-friendly webtool for TFBS search and visualization called LASAGNA-Search. The tool currently stores precomputed PSSM models for 189 TFs and 133 TFs built from TFBSs in the TRANSFAC Public database (release 7.0) and the ORegAnno database (08Nov10 dump), respectively. The webtool is available at http://biogrid.engr.uconn.edu/lasagna_search/.
引用
收藏
页数:13
相关论文
共 53 条
[1]  
Bailey TL., 1994, Proc Int Conf Intel Syst Mol Biol, V2, P28
[2]   Minimotif Miner: a tool for investigating protein function [J].
Balla, S ;
Thapar, V ;
Verma, S ;
Luong, T ;
Faghri, T ;
Huang, CH ;
Rajasekaran, S ;
del Campo, JJ ;
Shinn, JH ;
Mohler, WA ;
Maciejewski, MW ;
Gryk, MR ;
Piccirillo, B ;
Schiller, SR ;
Schiller, MR .
NATURE METHODS, 2006, 3 (03) :175-177
[3]  
Barash Y, 2001, SIMPLE HYPERGEOMETRI
[4]  
Bi C, 2007, MOL PHARM, V5, P3
[5]   Regulation of the human interleukin-5 promoter by Ets transcription factors -: Ets1 and Ets2, but not Elf-1, cooperate with GATA3 and HTLV-I Tax1 [J].
Blumenthal, SG ;
Aichele, G ;
Wirth, T ;
Czernilofsky, AP ;
Nordheim, A ;
Dittmer, J .
JOURNAL OF BIOLOGICAL CHEMISTRY, 1999, 274 (18) :12910-12916
[6]   JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update [J].
Bryne, Jan Christian ;
Valen, Eivind ;
Tang, Man-Hung Eric ;
Marstrand, Troels ;
Winther, Ole ;
da Piedade, Isabelle ;
Krogh, Anders ;
Lenhard, Boris ;
Sandelin, Albin .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D102-D106
[7]  
Buhler J, 2001, FINDING MOTIFS USING
[8]   Matlnspector and beyond: promoter analysis based on transcription factor binding sites [J].
Cartharius, K ;
Frech, K ;
Grote, K ;
Klocke, B ;
Haltmeier, M ;
Klingenhoff, A ;
Frisch, M ;
Bayerlein, M ;
Werner, T .
BIOINFORMATICS, 2005, 21 (13) :2933-2942
[9]   P-Match: transcription factor binding site search by combining patterns and weight matrices [J].
Chekmenev, DS ;
Haid, C ;
Kel, AE .
NUCLEIC ACIDS RESEARCH, 2005, 33 :W432-W437
[10]   WebLogo: A sequence logo generator [J].
Crooks, GE ;
Hon, G ;
Chandonia, JM ;
Brenner, SE .
GENOME RESEARCH, 2004, 14 (06) :1188-1190