HIGEDA: a hierarchical gene-set genetics based algorithm for finding subtle motifs in biological sequences

被引:6
作者
Le, Thanh [4 ]
Altman, Tom [4 ]
Gardiner, Katheleen [1 ,2 ,3 ]
机构
[1] Univ Colorado, Dept Pediat, Computat Biosci Program, Denver, CO 80202 USA
[2] Univ Colorado, Dept Pediat, Human Med Genet Program, Denver, CO 80202 USA
[3] Univ Colorado, Dept Pediat, Neurosci Program, Denver, CO 80202 USA
[4] Univ Colorado, Dept Comp Sci & Engn, Denver, CO 80202 USA
关键词
FACTOR-BINDING SITES; MATRICES; ELEMENTS;
D O I
10.1093/bioinformatics/btp676
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Identification of motifs in biological sequences is a challenging problem because such motifs are often short, degenerate, and may contain gaps. Most algorithms that have been developed for motif-finding use the expectation-maximization (EM) algorithm iteratively. Although EM algorithms can converge quickly, they depend strongly on initialization parameters and can converge to local sub-optimal solutions. In addition, they cannot generate gapped motifs. The effectiveness of EM algorithms in motif finding can be improved by incorporating methods that choose different sets of initial parameters to enable escape from local optima, and that allow gapped alignments within motif models. Results: We have developed HIGEDA, an algorithm that uses the hierarchical gene-set genetic algorithm (HGA) with EM to initiate and search for the best parameters for the motif model. In addition, HIGEDA can identify gapped motifs using a position weight matrix and dynamic programming to generate an optimal gapped alignment of the motif model with sequences from the dataset. We show that HIGEDA outperforms MEME and other motif-finding algorithms on both DNA and protein sequences.
引用
收藏
页码:302 / 309
页数:8
相关论文
共 15 条
[1]  
Bailey T L, 1995, Proc Int Conf Intell Syst Mol Biol, V3, P21
[2]  
Bi CP, 2007, 2007 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, P275
[3]  
Chang X., 2006, 1st Conf. Ind. Elec. Apps, P1
[4]   Discovering sequence motifs with arbitrary insertions and deletions [J].
Frith, Martin C. ;
Saunders, Neil F. W. ;
Kobe, Bostjan ;
Bailey, Timothy L. .
PLOS COMPUTATIONAL BIOLOGY, 2008, 4 (05)
[5]  
Henikoff JG, 1996, COMPUT APPL BIOSCI, V12, P135
[6]  
Hong TP, 2008, J COMPUT, V3, P67
[7]   fdrMotif:: identifying cis-elements by an EM algorithm coupled with false discovery rate control [J].
Li, Leping ;
Bass, Robert L. ;
Liang, Yu .
BIOINFORMATICS, 2008, 24 (05) :629-636
[8]   GAPWM: a genetic algorithm method for optimizing a position weight matrix [J].
Li, Leping ;
Liang, Yu ;
Bass, Robert L. .
BIOINFORMATICS, 2007, 23 (10) :1188-1194
[9]   Motif discoveries in unaligned molecular sequences using self-organizing neural networks [J].
Liu, Derong ;
Xiong, Xiaoxu ;
DasGupta, Bhaskar ;
Zhang, Huaguang .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2006, 17 (04) :919-928
[10]   A new approach to the assessment of the quality of predictions of transcription factor binding sites [J].
Nowakowski, Szymon ;
Tiuryn, Jerzy .
JOURNAL OF BIOMEDICAL INFORMATICS, 2007, 40 (02) :139-149