Comparison of optimization techniques for sequence pattern discovery by maximum-likelihood

被引:4
作者
Bi, Chengpeng [1 ,2 ,3 ]
机构
[1] Univ Missouri, Childrens Mercy Hosp, Sch Med, Div Clin Pharmacol,Bioinformat & Intelligent Comp, Kansas City, MO 64108 USA
[2] Univ Missouri, Childrens Mercy Hosp, Sch Comp, Div Clin Pharmacol,Bioinformat & Intelligent Comp, Kansas City, MO 64108 USA
[3] Univ Missouri, Childrens Mercy Hosp, Sch Engn, Div Clin Pharmacol,Bioinformat & Intelligent Comp, Kansas City, MO 64108 USA
关键词
Maximum-likelihood; Expectation maximization (EM); Markov chain Monte Carlo; Motif discovery; Multiple local alignment; Gene regulation; CIS-REGULATORY MODULES; DNA-BINDING SITES; MONTE-CARLO; EM ALGORITHM; DISTRIBUTIONS; CONVERGENCE; INFORMATION; ALIGNMENT; MODELS; MOTIFS;
D O I
10.1016/j.patrec.2009.09.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Among a set of observed relevant DNA sequences coming from a set of co-regulated genes, there exist some short, functional yet hidden sub-sequence patterns which recurrently appear across genomic sequences. The task of sequence pattern discovery, also known as motif discovery, is to uncover these unseen subsequences ab initio and then build a motif model for them. A plethora of motif algorithms has been designed to tackle this problem. This paper aims to compare a set of optimization techniques by consolidating them under the same maximum-likelihood (ML) framework. The framework unifies a suite of motif-finding algorithms by maximizing the same function, that enables a systematic comparison of different optimization schemes as well as provision of practical guidance on using these techniques. As a foundation, the ML framework is built for two categories of iterative optimization techniques (i.e. deterministic and stochastic) capable of exploring the sequence alignment space. The deterministic algorithms are to maximize the likelihood function by performing iteratively greedy local search. The stochastic algorithms are to iteratively draw motif location samples using Monte Carlo simulation and simultaneously keep track of solutions with local maximum-likelihoods. A total of five ML-based sequence pattern-finding algorithms are developed, evaluated and compared using simulated and real biological sequences. Results show that deterministic algorithms are more time-efficient than its stochastic counterparts, but their performance is not as good as the stochastic algorithms. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:2147 / 2160
页数:14
相关论文
共 56 条
  • [1] Alberts B., 2002, The shape and structure of proteins, Vfourth, DOI 10.1093/aob/mcg023
  • [2] [Anonymous], 2002, Monte Carlo strategies in scientific computing
  • [3] BAILEY TL, 1995, MACH LEARN, V21, P51, DOI 10.1007/BF00993379
  • [4] Bailey TL., 1994, Proc Int Conf Intel Syst Mol Biol, V2, P28
  • [5] BEMBOM O, 2006, GENET MOL BIOL, V6
  • [6] SELECTION OF DNA-BINDING SITES BY REGULATORY PROTEINS - STATISTICAL-MECHANICAL THEORY AND APPLICATION TO OPERATORS AND PROMOTERS
    BERG, OG
    VONHIPPEL, PH
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1987, 193 (04) : 723 - 743
  • [7] Bi Chengpeng, 2007, J Bioinform Comput Biol, V5, P47, DOI 10.1142/S0219720007002527
  • [8] Bi CP, 2007, 2007 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, P275
  • [9] Evolutionary Metropolis Sampling in Sequence Alignment Space
    Bi, Chengpeng
    [J]. 2008 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-8, 2008, : 189 - 194
  • [10] A Monte Carlo EM Algorithm for De Novo Motif Discovery in Biomolecular Sequences
    Bi, Chengpeng
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2009, 6 (03) : 370 - 386