Comparison of optimization techniques for sequence pattern discovery by maximum-likelihood

被引:4
作者
Bi, Chengpeng [1 ,2 ,3 ]
机构
[1] Univ Missouri, Childrens Mercy Hosp, Sch Med, Div Clin Pharmacol,Bioinformat & Intelligent Comp, Kansas City, MO 64108 USA
[2] Univ Missouri, Childrens Mercy Hosp, Sch Comp, Div Clin Pharmacol,Bioinformat & Intelligent Comp, Kansas City, MO 64108 USA
[3] Univ Missouri, Childrens Mercy Hosp, Sch Engn, Div Clin Pharmacol,Bioinformat & Intelligent Comp, Kansas City, MO 64108 USA
关键词
Maximum-likelihood; Expectation maximization (EM); Markov chain Monte Carlo; Motif discovery; Multiple local alignment; Gene regulation; CIS-REGULATORY MODULES; DNA-BINDING SITES; MONTE-CARLO; EM ALGORITHM; DISTRIBUTIONS; CONVERGENCE; INFORMATION; ALIGNMENT; MODELS; MOTIFS;
D O I
10.1016/j.patrec.2009.09.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Among a set of observed relevant DNA sequences coming from a set of co-regulated genes, there exist some short, functional yet hidden sub-sequence patterns which recurrently appear across genomic sequences. The task of sequence pattern discovery, also known as motif discovery, is to uncover these unseen subsequences ab initio and then build a motif model for them. A plethora of motif algorithms has been designed to tackle this problem. This paper aims to compare a set of optimization techniques by consolidating them under the same maximum-likelihood (ML) framework. The framework unifies a suite of motif-finding algorithms by maximizing the same function, that enables a systematic comparison of different optimization schemes as well as provision of practical guidance on using these techniques. As a foundation, the ML framework is built for two categories of iterative optimization techniques (i.e. deterministic and stochastic) capable of exploring the sequence alignment space. The deterministic algorithms are to maximize the likelihood function by performing iteratively greedy local search. The stochastic algorithms are to iteratively draw motif location samples using Monte Carlo simulation and simultaneously keep track of solutions with local maximum-likelihoods. A total of five ML-based sequence pattern-finding algorithms are developed, evaluated and compared using simulated and real biological sequences. Results show that deterministic algorithms are more time-efficient than its stochastic counterparts, but their performance is not as good as the stochastic algorithms. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:2147 / 2160
页数:14
相关论文
共 56 条
  • [41] PEVZNER P, 2000, P 1 ISMB C, V1, P269
  • [42] microRNA target predictions in animals
    Rajewsky, Nikolaus
    [J]. NATURE GENETICS, 2006, 38 (Suppl 6) : S8 - S13
  • [43] Genome-wide location and function of DNA binding proteins
    Ren, B
    Robert, F
    Wyrick, JJ
    Aparicio, O
    Jennings, EG
    Simon, I
    Zeitlinger, J
    Schreiber, J
    Hannett, N
    Kanin, E
    Volkert, TL
    Wilson, CJ
    Bell, SP
    Young, RA
    [J]. SCIENCE, 2000, 290 (5500) : 2306 - +
  • [44] SANDELIN A, 2004, NUCLEIC ACIDS RES, V91, pD94
  • [45] IDENTIFYING PROTEIN-BINDING SITES FROM UNALIGNED DNA FRAGMENTS
    STORMO, GD
    HARTZELL, GW
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1989, 86 (04) : 1183 - 1187
  • [46] DNA binding sites: representation and discovery
    Stormo, GD
    [J]. BIOINFORMATICS, 2000, 16 (01) : 16 - 23
  • [47] NONUNIVERSAL CRITICAL-DYNAMICS IN MONTE-CARLO SIMULATIONS
    SWENDSEN, RH
    WANG, JS
    [J]. PHYSICAL REVIEW LETTERS, 1987, 58 (02) : 86 - 88
  • [48] THE CALCULATION OF POSTERIOR DISTRIBUTIONS BY DATA AUGMENTATION
    TANNER, MA
    WING, HW
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1987, 82 (398) : 528 - 540
  • [49] Assessing computational tools for the discovery of transcription factor binding sites
    Tompa, M
    Li, N
    Bailey, TL
    Church, GM
    De Moor, B
    Eskin, E
    Favorov, AV
    Frith, MC
    Fu, YT
    Kent, WJ
    Makeev, VJ
    Mironov, AA
    Noble, WS
    Pavesi, G
    Pesole, G
    Régnier, M
    Simonis, N
    Sinha, S
    Thijs, G
    van Helden, J
    Vandenbogaert, M
    Weng, ZP
    Workman, C
    Ye, C
    Zhu, Z
    [J]. NATURE BIOTECHNOLOGY, 2005, 23 (01) : 137 - 144
  • [50] SYSTEMATIC EVOLUTION OF LIGANDS BY EXPONENTIAL ENRICHMENT - RNA LIGANDS TO BACTERIOPHAGE-T4 DNA-POLYMERASE
    TUERK, C
    GOLD, L
    [J]. SCIENCE, 1990, 249 (4968) : 505 - 510