Computational Tools for Evaluating Phylogenetic and Hierarchical Clustering Trees

被引:34
作者
Chakerian, John [1 ]
Holmes, Susan [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
关键词
Bootstrap; Hierarchical clustering; Multidimensional scaling; Negatively curved space; Phylogenetic tree; INFERENCE; ALGORITHMS;
D O I
10.1080/10618600.2012.640901
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Inferential summaries of tree estimates are useful in the setting of evolutionary biology, where phylogenetic trees have been built from DNA data since the 1960s. In bioinformatics, psychometrics, and data mining, hierarchical clustering techniques output the same mathematical objects, and practitioners have similar questions about the stability and "generalizability" of these summaries. This article describes the implementation of the geometric distance between trees developed by Billera, Holmes, and Vogtmann (2001) equally applicable to phylogenetic trees and hierarchical clustering trees, and shows some of the applications in evaluating tree estimates. In particular, since Billera et al. (2001) have shown that the space of trees is negatively curved (called a CAT(0) space), a collection of trees can naturally be represented as a tree. We compare this representation to the Euclidean approximations of treespace made available through both a classical multidimensional scaling and a Kernel multidimensional scaling of the matrix of the distances between trees. We also provide applications of the distances between trees to hierarchical clustering trees constructed from microarrays. Our method gives a new way of evaluating the influence of both certain columns (positions, variables, or genes) and certain rows (species, observations, or arrays) on the construction of such trees. It also can provide a way of detecting heterogeneous mixtures in the input data. Supplementary materials for this article are available online.
引用
收藏
页码:581 / 599
页数:19
相关论文
共 39 条
[1]  
[Anonymous], P INT C NEUR INF PRO
[2]  
[Anonymous], 1999, METRIC SPACES NONPOS
[3]  
[Anonymous], 1980, Multivariate Analysis
[4]  
[Anonymous], 2006, ANAL PHYLOGENETICS E
[5]  
Carr D.B., 1997, STAT COMPUTING GRAPH, V8, P20
[6]  
Chakerian J., 2010, distory: Distances Between Trees
[7]   DETECTION OF INFLUENTIAL OBSERVATION IN LINEAR-REGRESSION [J].
COOK, RD .
TECHNOMETRICS, 1977, 19 (01) :15-18
[8]   Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle [J].
Desper, R ;
Gascuel, O .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2002, 9 (05) :687-705
[9]   HORSESHOES IN MULTIDIMENSIONAL SCALING AND LOCAL KERNEL METHODS [J].
Diaconis, Persi ;
Goel, Sharad ;
Holmes, Susan .
ANNALS OF APPLIED STATISTICS, 2008, 2 (03) :777-807
[10]   Bootstrap confidence levels for phylogenetic trees (vol 93, pg 7085, 1996) [J].
Efron, B ;
Halloran, E ;
Holmes, S .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1996, 93 (23) :13429-13434