qPMS7: A Fast Algorithm for Finding (l, d)-Motifs in DNA and Protein Sequences

被引:34
作者
Dinh, Hieu [1 ]
Rajasekaran, Sanguthevar [1 ]
Davila, Jaime [2 ]
机构
[1] Univ Connecticut, Dept Comp Sci & Engn, Storrs, CT 06269 USA
[2] Mayo Clin, Div Biomed Stat & Informat, Rochester, MN USA
基金
美国国家科学基金会;
关键词
MOTIFS;
D O I
10.1371/journal.pone.0041425
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Detection of rare events happening in a set of DNA/protein sequences could lead to new biological discoveries. One kind of such rare events is the presence of patterns called motifs in DNA/protein sequences. Finding motifs is a challenging problem since the general version of motif search has been proven to be intractable. Motifs discovery is an important problem in biology. For example, it is useful in the detection of transcription factor binding sites and transcriptional regulatory elements that are very crucial in understanding gene function, human disease, drug design, etc. Many versions of the motif search problem have been proposed in the literature. One such is the (l, d)-motif search (or Planted Motif Search (PMS)). A generalized version of the PMS problem, namely, Quorum Planted Motif Search (qPMS), is shown to accurately model motifs in real data. However, solving the qPMS problem is an extremely difficult task because a special case of it, the PMS Problem, is already NP-hard, which means that any algorithm solving it can be expected to take exponential time in the worse case scenario. In this paper, we propose a novel algorithm named qPMS7 that tackles the qPMS problem on real data as well as challenging instances. Experimental results show that our Algorithm qPMS7 is on an average 5 times faster than the state-of-art algorithm. The executable program of Algorithm qPMS7 is freely available on the web at http://pms.engr.uconn.edu/downloads/qPMS7.zip. Our online motif discovery tools that use Algorithm qPMS7 are freely available at http://pms.engr.uconn.edu or http://motifsearch.com.
引用
收藏
页数:8
相关论文
共 28 条
[1]  
Bailey T L, 1994, Proc Int Conf Intell Syst Mol Biol, V2, P28
[2]  
Bandyopadhyay S, 2012, P 2 IEEE INT C COMP, P1
[3]  
Blanchette M, 2000, Proc Int Conf Intell Syst Mol Biol, V8, P37
[4]  
Blanchette M., 2001, P 5 ANN INT C COMP M, P49
[5]   Predicting gene regulatory elements in silico on a genomic scale [J].
Brazma, A ;
Jonassen, I ;
Vilo, J ;
Ukkonen, E .
GENOME RESEARCH, 1998, 8 (11) :1202-1215
[6]  
Buhler Jeremy., 2001, J COMPUT BIOL, P69
[7]  
Chin F., 2005, Proceedings of the 3rd Asia-Pacific Bioinformatics Conference, P261
[8]  
Davila J, 2007, AB BAF MU MAG KASH B
[9]   Fast and practical algorithms for planted (l, d) motif search [J].
Davila, Jaime ;
Balla, Sudha ;
Rajasekaran, Sanguthevar .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2007, 4 (04) :544-552
[10]   PMS5: an efficient exact algorithm for the (l, d)-motif finding problem [J].
Dinh, Hieu ;
Rajasekaran, Sanguthevar ;
Kundeti, Vamsi K. .
BMC BIOINFORMATICS, 2011, 12