SMOTER, A Structured Motif Finder Based on an Exhaustive Tree-Based Algorithm

被引:1
作者
Sheikhizadeh, Siavash [1 ]
Hosseini, Samin [2 ]
机构
[1] Shahid Bahonar Univ, Kerman, Iran
[2] Vali E Asr Univ Rafsanjan, Dept Plant Protect, Kerman, Iran
关键词
Exhaustive algorithm; e-mutated set; geminivirideae; l-mer trie; structured motif; tymovirius; IDENTIFICATION; PROMOTER; SEQUENCE; BINDING;
D O I
10.2174/1574893608999140109122231
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
In this paper, an exhaustive algorithm for extracting structured motifs has been presented. Structured motif is defined as an ordered set of highly-conserved over-presented patterns which occur near each other in a set of DNA sequences. The presented algorithm is based on an innovative data structure called l-mer trie. As opposed to other existing motif finders, this algorithm offers more flexibility in terms of the possibility of determining a range for length of the single patterns, their substitution rates and the spacing between them. The possibility of defining a minimum bound for substitution rates saves considerable time and space in the case of searching for weak motifs occurring with many mutations. Efficiency of the algorithm has been verified on some artificial sequences as well as real DNA sequences of some plant viruses. The results have been compared with those achieved by RISO, another tree-based algorithm, which is claimed to have notable time and space gains over the best known exact algorithms.
引用
收藏
页码:34 / 43
页数:10
相关论文
共 22 条
[1]  
Aho A. V., 1974, The design and analysis of computer algorithms
[2]  
Allali J, MOST K DEEP FACTOR T
[3]   EXPRESSION OF ANIMAL VIRUS GENOMES [J].
BALTIMORE, D .
BACTERIOLOGICAL REVIEWS, 1971, 35 (03) :235-+
[4]  
Boeckenhauer Hans-Joachim, 2007, P1
[5]  
Carvalho A., 2005, Asia-Pacific Bioinformatics Conference, P273
[6]   An efficient algorithm for the identification of structured motifs in DNA promoter sequences [J].
Carvalho, AM ;
Freitas, AT ;
Oliveira, AL ;
Sagot, MF .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2006, 3 (02) :126-140
[7]   THE TYMOBOX, A SEQUENCE SHARED BY MOST TYMOVIRUSES - ITS USE IN MOLECULAR STUDIES OF TYMOVIRUSES [J].
DING, S ;
HOWE, J ;
KEESE, P ;
MACKENZIE, A ;
MEEK, D ;
OSORIOKEESE, M ;
SKOTNICKI, M ;
SRIFAH, P ;
TORRONEN, M ;
GIBBS, A .
NUCLEIC ACIDS RESEARCH, 1990, 18 (05) :1181-1187
[8]  
Durbin R., 1998, Biological sequence analysis: probabilistic models of proteins and nucleic acids
[9]  
FRAENKEL YM, 1995, COMPUT APPL BIOSCI, V11, P379
[10]   Translational control of cellular and viral mRNAs [J].
Gallie, DR .
PLANT MOLECULAR BIOLOGY, 1996, 32 (1-2) :145-158