Approximating snowflake metrics by trees

被引:0
作者
Leeb, William [1 ,2 ]
机构
[1] Yale Univ, Dept Math, New Haven, CT 06511 USA
[2] Princeton Univ, PACM, Princeton, NJ 08554 USA
关键词
Tree metric; Partition trees; Tree approximation; Snowflake metric; Dimension; Spaces of homogeneous type; Earth Mover's Distance; EMD; EARTH MOVERS DISTANCE; EMBEDDINGS;
D O I
10.1016/j.acha.2016.10.002
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Tree metrics are encountered throughout pure and applied mathematics. Their simple structure makes them a convenient choice of metric in many applications from machine learning and computer science. At the same time, there is an elegant theory of harmonic analysis with respect to tree metrics that parallels the classical theory. A basic question in this field, which is of both theoretical and practical interest, is how to design efficient algorithms for building trees with good metric properties. In particular, given a finite metric space, we seek a random family of dominating tree metrics approximating the underlying metric in expectation. For general metrics, this problem has been solved: on the one hand, there are finite metric spaces that cannot be approximated by trees without incurring a distortion logarithmic in the size of the space, while the tree construction of Fakcharoenphol, Rao, and Talwar (FRT, 2003) shows how to achieve such a logarithmic error for arbitrary metrics. Since a distortion that grows even logarithmically with the size of the set may be too large for practical use in many settings, one naturally asks if there is a more restricted class of metrics where one can do better. The main result of this paper is that certain random family of trees already studied in the computer science literature, including the FRT trees, can be used to approximate snowflake metrics (metrics raised to a power less than 1) with expected distortion bounded by its doubling dimension and the degree of snowflaking. We also show that without snowflaking, the metric distortion can be bounded by a term logarithmic in the distance being approximated and linear in the dimension. We also present an optimal algorithm for building a single FRT tree, whose running time is bounded independently of all problem parameters other than the number of points. We conclude by demonstrating our theoretical results on a numerical example, and applying them to the approximation of the Earth Mover's Distance between probability distributions. (C) 2016 Elsevier Inc. All rights reserved.
引用
收藏
页码:405 / 424
页数:20
相关论文
共 37 条
  • [1] Abraham I., 2006, P 26 IEEE INT C DIST, P1
  • [2] A GRAPH-THEORETIC GAME, AND ITS APPLICATION TO THE K-SERVER PROBLEM
    ALON, N
    KARP, RM
    PELEG, D
    WEST, D
    [J]. SIAM JOURNAL ON COMPUTING, 1995, 24 (01) : 78 - 100
  • [3] [Anonymous], 2008, Computer Vision and Pattern Recognition
  • [4] [Anonymous], P 44 ANN IEEE S FDN
  • [5] [Anonymous], STOC 2002
  • [6] ASSOUAD P, 1983, B SOC MATH FR, V111, P429
  • [7] Bartal Y., 1998, Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, P161, DOI 10.1145/276698.276725
  • [8] Probabilistic approximation of metric spaces and its algorithmic applications
    Bartal, Y
    [J]. 37TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 1996, : 184 - 193
  • [9] Laplacian eigenmaps for dimensionality reduction and data representation
    Belkin, M
    Niyogi, P
    [J]. NEURAL COMPUTATION, 2003, 15 (06) : 1373 - 1396
  • [10] Beygelzimer A., 2006, ICML, DOI DOI 10.1145/1143844.1143857