A mixed integer linear programming model to reconstruct phylogenies from single nucleotide polymorphism haplotypes under the maximum parsimony criterion

被引:5
作者
Catanzaro, Daniele [1 ]
Ravi, Ramamoorthi [2 ]
Schwartz, Russell [3 ,4 ]
机构
[1] Univ Libre Bruxelles, Dept Comp Sci, B-1050 Brussels, Belgium
[2] Carnegie Mellon Univ, Tepper Sch Business, Pittsburgh, PA 15213 USA
[3] Carnegie Mellon Univ, Dept Biol Sci, Pittsburgh, PA 15213 USA
[4] Carnegie Mellon Univ, Lane Ctr Computat Biol, Pittsburgh, PA 15213 USA
基金
美国国家卫生研究院; 美国安德鲁·梅隆基金会;
关键词
Combinatorial optimization; Exact algorithms; Mixed integer programming; Phylogeny estimation; Haplotype estimation; Maximum parsimony; Single nucleotide polymorphism; Computational biology; STEINER TREE PROBLEM; PERFECT PHYLOGENY; TIME ALGORITHM; EVOLUTION;
D O I
10.1186/1748-7188-8-3
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Phylogeny estimation from aligned haplotype sequences has attracted more and more attention in the recent years due to its importance in analysis of many fine-scale genetic data. Its application fields range from medical research, to drug discovery, to epidemiology, to population dynamics. The literature on molecular phylogenetics proposes a number of criteria for selecting a phylogeny from among plausible alternatives. Usually, such criteria can be expressed by means of objective functions, and the phylogenies that optimize them are referred to as optimal. One of the most important estimation criteria is the parsimony which states that the optimal phylogeny T* for a set H of n haplotype sequences over a common set of variable loci is the one that satisfies the following requirements: (i) it has the shortest length and (ii) it is such that, for each pair of distinct haplotypes hi, hj is an element of H, the sum of the edge weights belonging to the path from hi to hj in T* is not smaller than the observed number of changes between hi and hj. Finding the most parsimonious phylogeny for H involves solving an optimization problem, called the Most Parsimonious Phylogeny Estimation Problem (MPPEP), which isNP-hard in many of its versions. Results: In this article we investigate a recent version of the MPPEP that arises when input data consist of single nucleotide polymorphism haplotypes extracted from a population of individuals on a common genomic region. Specifically, we explore the prospects for improving on the implicit enumeration strategy of implicit enumeration strategy used in previous work using a novel problem formulation and a series of strengthening valid inequalities and preliminary symmetry breaking constraints to more precisely bound the solution space and accelerate implicit enumeration of possible optimal phylogenies. We present the basic formulation and then introduce a series of provable valid constraints to reduce the solution space. We then prove that these constraints can often lead to significant reductions in the gap between the optimal solution and its non-integral linear programming bound relative to the prior art as well as often substantially faster processing of moderately hard problem instances. Conclusion: We provide an indication of the conditions under which such an optimal enumeration approach is likely to be feasible, suggesting that these strategies are usable for relatively large numbers of taxa, although with stricter limits on numbers of variable sites. The work thus provides methodology suitable for provably optimal solution of some harder instances that resist all prior approaches.
引用
收藏
页数:14
相关论文
共 38 条
[1]   A POLYNOMIAL-TIME ALGORITHM FOR THE PERFECT PHYLOGENY PROBLEM WHEN THE NUMBER OF CHARACTER STATES IS FIXED [J].
AGARWALA, R ;
FERNANDEZBACA, D .
SIAM JOURNAL ON COMPUTING, 1994, 23 (06) :1216-1224
[2]  
Albert V.A., 2005, Parsimony, phylogeny, and genomics
[3]  
[Anonymous], 2004, Inferring phylogenies
[4]  
[Anonymous], 2000, 0037 ZIB
[5]   Industrial applications of high-performance computing for phylogeny reconstruction [J].
Bader, DA ;
Moret, BME ;
Vawter, L .
COMMERCIAL APPLICATIONS FOR HIGH-PERFORMANCE COMPUTING, 2001, 4528 :159-168
[6]   Haplotyping as perfect phylogeny: A direct approach [J].
Bafna, V ;
Gusfield, D ;
Lancia, G ;
Yooseph, S .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2003, 10 (3-4) :323-340
[7]  
BEYER W A, 1974, Mathematical Biosciences, V19, P9, DOI 10.1016/0025-5564(74)90028-5
[8]   A linear-time algorithm for the perfect phylogeny haplotype problem [J].
Bonizzoni, Paola .
ALGORITHMICA, 2007, 48 (03) :267-285
[9]   Predicting the evolution of human influenza A [J].
Bush, RM ;
Bender, CA ;
Subbarao, K ;
Cox, NJ ;
Fitch, WM .
SCIENCE, 1999, 286 (5446) :1921-1925
[10]  
Catanzaro D, 2011, MATHEMATICAL APPROACHES TO POLYMER SEQUENCE ANALYSIS AND RELATED PROBLEMS, P149, DOI 10.1007/978-1-4419-6800-5_8