Fast, Linear Time Hierarchical Clustering using the Baire Metric

被引:12
作者
Contreras, Pedro [1 ,2 ]
Murtagh, Fionn [3 ,4 ]
机构
[1] Univ London, Dept Comp Sci, Egham TW20 0EX, Surrey, England
[2] ThinkingSafe Ltd, Egham, Surrey, England
[3] Univ London, Dublin, Ireland
[4] Sci Fdn Ireland, Dublin, Ireland
关键词
Hierarchical clustering; Ultrametric; Redshift; k-means; p-adic; m-adic; Baire; Longest common prefix; ULTRAMETRICITY; DENDROGRAMS; COMPUTATION;
D O I
10.1007/s00357-012-9106-3
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
The Baire metric induces an ultrametric on a dataset and is of linear computational complexity, contrasted with the standard quadratic time agglomerative hierarchical clustering algorithm. In this work we evaluate empirically this new approach to hierarchical clustering. We compare hierarchical clustering based on the Baire metric with (i) agglomerative hierarchical clustering, in terms of algorithm properties; (ii) generalized ultrametrics, in terms of definition; and (iii) fast clustering through k-means partitioning, in terms of quality of results. For the latter, we carry out an in depth astronomical study. We apply the Baire distance to spectrometric and photometric redshifts from the Sloan Digital Sky Survey using, in this work, about half a million astronomical objects. We want to know how well the (more costly to determine) spectrometric redshifts can predict the (more easily obtained) photometric redshifts, i.e. we seek to regress the spectrometric on the photometric redshifts, and we use clusterwise regression for this.
引用
收藏
页码:118 / 143
页数:26
相关论文
共 50 条
  • [41] A Multi-metric Algorithm for Hierarchical Clustering of Same-Length Protein Sequences
    Tsarouchis, Sotirios-Filippos
    Kotouza, Maria Th
    Psomopoulos, Fotis E.
    Mitkas, Pericles A.
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2018, 2018, 520 : 189 - 199
  • [42] Fast and Accurate Hierarchical Clustering Based on Growing Multilayer Topology Training
    Cheung, Yiu-ming
    Zhang, Yiqun
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (03) : 876 - 890
  • [43] Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm
    Gagolewski, Marek
    Bartoszuk, Maciej
    Cena, Anna
    INFORMATION SCIENCES, 2016, 363 : 8 - 23
  • [44] Unsupervised Image Segmentation Using Hierarchical Clustering
    Keiko Ohkura
    Hidekazu Nishizawa
    Takashi Obi
    Akira Hasegawa
    Masahiro Yamaguchi
    Nagaaki Ohyama
    Optical Review, 2000, 7 : 193 - 198
  • [45] Unsupervised image segmentation using hierarchical clustering
    Ohkura, K
    Nishizawa, H
    Obi, T
    Hasegawa, A
    Yamaguchi, M
    Ohyama, N
    OPTICAL REVIEW, 2000, 7 (03) : 193 - 198
  • [46] IDENTIFYING WAREHOUSE LOCATION USING HIERARCHICAL CLUSTERING
    Skerlic, Sebastjan
    Muha, Robert
    TRANSPORT PROBLEMS, 2016, 11 (03) : 121 - 129
  • [47] Financial Time Series Forecasting with Grouped Predictors using Hierarchical Clustering and Support Vector Regression
    ZheGao
    JianjunYang
    INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2014, 7 (05): : 53 - 64
  • [48] TREND AND TIME SERIES CLUSTER ANALYSIS OF CRIME INCIDENCES IN INDIA USING DYNAMIC TIME WARPING AND HIERARCHICAL CLUSTERING METHOD
    Goswami, Daalima
    Hazarika, Jiten
    INTERNATIONAL JOURNAL OF AGRICULTURAL AND STATISTICAL SCIENCES, 2022, 18 (02): : 499 - 506
  • [49] Time series clustering by a robust autoregressive metric with application to air pollution
    D'Urso, Pierpaolo
    De Giovanni, Livia
    Massari, Riccardo
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2015, 141 : 107 - 124
  • [50] Personalizing navigation in folksonomies using hierarchical tag clustering
    Gernmell, Jonathan
    Shepitsen, Andriy
    Mobasher, Bamshad
    Burke, Robin
    DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2008, 5182 : 196 - 205