Perfect Phylogeny Problems with Missing Values

被引:5
作者
Kirkpatrick, Bonnie [1 ]
Stevens, Kristian [2 ]
机构
[1] Univ Miami, Dept Comp Sci, Coral Gables, FL 33124 USA
[2] Univ Calif Davis, Dept Comp Sci, Davis, CA 95616 USA
关键词
Perfect phylogeny; rich data hypothesis; missing data; partition intersection graph; ALGORITHM; MATRICES;
D O I
10.1109/TCBB.2014.2316005
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The perfect phylogeny problem is of central importance to both evolutionary biology and population genetics. Missing values are a common occurrence in both sequence and genotype data, but they make the problem of finding a perfect phylogeny NP-hard even for binary characters. We introduce new and efficient perfect phylogeny algorithms for broad classes of binary and multistate data with missing values. Specifically, we address binary missing data consistent with the rich data hypothesis (RDH) introduced by Halperin and Karp and give an efficient algorithm for enumerating phylogenies. This algorithm is useful for computing the probability of data with missing values under the coalescent model. In addition, we use the partition intersection (PI) graph and chordal graph theory to generalize the RDH to multi-state characters with missing values. For a bounded number of states, we provide a fixed parameter tractable algorithm for the perfect phylogeny problem with missing data. Utilizing the PI graph, we are able to show that under multiple biologically motivated models for character data, our generalized RDH holds with high probability, and we evaluate our results with extensive empirical analysis.
引用
收藏
页码:928 / 941
页数:14
相关论文
共 30 条
[1]   A POLYNOMIAL-TIME ALGORITHM FOR THE PERFECT PHYLOGENY PROBLEM WHEN THE NUMBER OF CHARACTER STATES IS FIXED [J].
AGARWALA, R ;
FERNANDEZBACA, D .
SIAM JOURNAL ON COMPUTING, 1994, 23 (06) :1216-1224
[2]  
Blair Jean R. S., 1993, GRAPH THEORY SPARSE, P1, DOI [DOI 10.1007/978-1-4613-8369-7, DOI 10.1007/978-1-4613-8369-71]
[3]  
Buneman P., 1974, Discrete Mathematics, V9, P205, DOI 10.1016/0012-365X(74)90002-8
[4]   Efficient whole-genome association mapping using local phylogenies for unphased genotype data [J].
Ding, Zhihong ;
Mailund, Thomas ;
Song, Yun S. .
BIOINFORMATICS, 2008, 24 (19) :2215-2221
[5]  
Dirac G. A., 1961, Abhandlungen aus dem Mathematicschen Seminar der University Hamburg, V25, P71, DOI [10.1007/BF02992776, DOI 10.1007/BF02992776]
[6]   CONVEX TREE REALIZATIONS OF PARTITIONS [J].
DRESS, A ;
STEEL, M .
APPLIED MATHEMATICS LETTERS, 1992, 5 (03) :3-6
[7]   INCIDENCE MATRICES AND INTERVAL GRAPHS [J].
FULKERSON, DR ;
GROSS, OA .
PACIFIC JOURNAL OF MATHEMATICS, 1965, 15 (03) :835-+
[8]  
Golumbic M.C., 2004, Algorithmic Graph Theory and Perfect Graphs
[9]  
Gusfield D, 2007, LECT NOTES COMPUT SC, V4598, P51
[10]  
Gusfield D, 2009, LECT NOTES COMPUT SC, V5541, P236, DOI 10.1007/978-3-642-02008-7_18