Deterministic local alignment methods improved by a simple genetic algorithm

被引:10
作者
Bi, Chengpeng [1 ]
机构
[1] Univ Missouri, Bioinformat & Intelligent Comp Lab, Div Clin Pharmacol, Childrens Mercy Hosp,Sch Med Comp & Engn, Kansas City, MO 64108 USA
关键词
Expectation maximization (EM); Genetic algorithms (GA); Motif discovery; Multiple sequence local alignment; Memetic algorithms; MULTIPLE SEQUENCE ALIGNMENT; DATA AUGMENTATION ALGORITHMS; DNA-BINDING SITES; EM ALGORITHM; EXPECTATION MAXIMIZATION; REGULATORY PROTEINS; MIXTURE-MODELS; MOTIFS; IDENTIFICATION; STRATEGIES;
D O I
10.1016/j.neucom.2010.01.023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multiple sequence local alignment, often deployed for de nova discovery of biological motifs hidden in a set of DNA or protein sequences, remains a challenge in bioinformatics and computational biology. Many algorithms and software packages have been developed to address the problem. Expectation maximization (EM), one of the popular local alignment methods, is often used to solve the motif-finding problem. However, EM largely depends on its initialization and can be easily trapped in local optima. This paper presents the Genetic-enabled EM Motif-Finding Algorithm (GEMFA) in an effort to mitigate the difficulties confronted the EM-based motif discovery algorithms. The new algorithm integrates a simple genetic algorithm (GA) with a local searcher to explore the local alignment space, that is, it combines deterministic local alignment methods with a simple GA to effectively perform de novo motif discovery. It first initializes a population of multiple local alignments each of which is encoded on a chromosome that represents a potential solution. GEMFA then performs heuristic search in the whole alignment space using minimum distance length (MDL) as the fitness function, which is generalized from maximum log-likelihood. The genetic algorithm gradually moves this population towards the best alignment from which the motif model is derived. Simulated and real biological sequence analysis showed that GEMFA significantly improved deterministic local alignment methods especially in the subtle motif sequence alignment, and it also outperformed other algorithms tested. (C) 2010 Elsevier B.V. All rights reserved,
引用
收藏
页码:2394 / 2406
页数:13
相关论文
共 51 条
[1]  
[Anonymous], 2001, BIOINFORMATICS
[2]  
[Anonymous], 2001, An Introduction to Genetic Algorithms. Complex Adaptive Systems
[3]  
[Anonymous], J ROYAL STAT SOC B
[4]  
BAILEY TL, 1995, MACH LEARN, V21, P51, DOI 10.1007/BF00993379
[5]  
Bailey TL., 1994, Proc Int Conf Intel Syst Mol Biol, V2, P28
[7]   SELECTION OF DNA-BINDING SITES BY REGULATORY PROTEINS - STATISTICAL-MECHANICAL THEORY AND APPLICATION TO OPERATORS AND PROMOTERS [J].
BERG, OG ;
VONHIPPEL, PH .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 193 (04) :723-743
[8]  
BI C, 2002, PATTERN CLASSIFICATI
[9]  
BI C, 2010, PATTERN RECOGN, DOI DOI 10.1016/J.PATREC.2009.005
[10]   Data augmentation algorithms for detecting conserved domains in protein sequences: A comparative study [J].
Bi, Chengpeng .
JOURNAL OF PROTEOME RESEARCH, 2008, 7 (01) :192-201