Detecting Character Dependencies in Stochastic Models of Evolution

被引:0
作者
Chakrabarty, Deeparnab [1 ]
Kannan, Sampath [2 ]
Tian, Kevin [2 ]
机构
[1] Microsoft Res, Bangalore, Karnataka, India
[2] Univ Penn, Dept Comp & Informat Sci, 3330 Walnut St,Levine Hall, Philadelphia, PA 19104 USA
关键词
dependent characters; efficient algorithms; evolution; stochastic models; CORRELATED EVOLUTION; MAXIMUM-LIKELIHOOD; LOGS SUFFICE; TREES; SEQUENCES; BUILD;
D O I
10.1089/cmb.2015.0099
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Stochastic models of biological evolution generally assume that different characters (runs of the stochastic process) are independent and identically distributed. In this article we determine the asymptotic complexity of detecting dependence for some fairly general models of evolution, but simple models of dependence. A key difference from much of the previous work is that our algorithms work without knowledge of the tree topology. Specifically, we consider various stochastic models of evolution ranging from the common ones used by biologists (such as Cavender-Farris-Neyman and Jukes-Cantor models) to very general ones where evolution of different characters can be governed by different transition matrices on each edge of the evolutionary tree (phylogeny). We also consider several models of dependence between two characters. In the most specific model, on each edge of the phylogeny the joint distribution of the dependent characters undergoes a perturbation of a fixed magnitude, in a fixed direction from what it would be if the characters were evolving independently. More general dependence models don't require such a strong "signal." Instead they only require that on each edge, the perturbation of the joint distribution has a significant component in a specific direction. Our main results are nearly tight bounds on the induced or operator norm of the transition matrices that would allow us to detect dependence efficiently for most models of evolution and dependence that we consider. We make essential use of a new concentration result for multistate random variables of a Markov random field on arbitrary trivalent trees: We show that the random variable counting the number of leaves in any particular state has variance that is subquadratic in the number of leaves.
引用
收藏
页码:180 / 191
页数:12
相关论文
共 27 条
[1]  
[Anonymous], 2004, Inferring phylogenies
[2]  
[Anonymous], 2009, American Mathematical Soc.
[3]  
CAVENDER JA, 1978, MATH BIOSCI, V40, P271, DOI 10.1016/0025-5564(78)90089-5
[4]   Full reconstruction of Markov models on evolutionary trees: Identifiability and consistency [J].
Chang, JT .
MATHEMATICAL BIOSCIENCES, 1996, 137 (01) :51-73
[5]  
Daskalakis C., 2006, STOC'06. Proceedings of the 38th Annual ACM Symposium on Theory of Computing, P159, DOI 10.1145/1132516.1132540
[6]  
Erdos PL, 1999, RANDOM STRUCT ALGOR, V14, P153, DOI 10.1002/(SICI)1098-2418(199903)14:2<153::AID-RSA3>3.0.CO
[7]  
2-R
[8]   A few logs suffice to build (almost) all trees:: Part II [J].
Erdos, PL ;
Steel, MA ;
Székely, LA ;
Warnow, TJ .
THEORETICAL COMPUTER SCIENCE, 1999, 221 (1-2) :77-118
[9]   Efficient algorithms for inverting evolution [J].
Farach, M ;
Kannan, S .
JOURNAL OF THE ACM, 1999, 46 (04) :437-449
[10]   PROBABILITY MODEL FOR INFERRING EVOLUTIONARY TREES [J].
FARRIS, JS .
SYSTEMATIC ZOOLOGY, 1973, 22 (03) :250-256