Fast, Linear Time Hierarchical Clustering using the Baire Metric

被引:12
作者
Contreras, Pedro [1 ,2 ]
Murtagh, Fionn [3 ,4 ]
机构
[1] Univ London, Dept Comp Sci, Egham TW20 0EX, Surrey, England
[2] ThinkingSafe Ltd, Egham, Surrey, England
[3] Univ London, Dublin, Ireland
[4] Sci Fdn Ireland, Dublin, Ireland
关键词
Hierarchical clustering; Ultrametric; Redshift; k-means; p-adic; m-adic; Baire; Longest common prefix; ULTRAMETRICITY; DENDROGRAMS; COMPUTATION;
D O I
10.1007/s00357-012-9106-3
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
The Baire metric induces an ultrametric on a dataset and is of linear computational complexity, contrasted with the standard quadratic time agglomerative hierarchical clustering algorithm. In this work we evaluate empirically this new approach to hierarchical clustering. We compare hierarchical clustering based on the Baire metric with (i) agglomerative hierarchical clustering, in terms of algorithm properties; (ii) generalized ultrametrics, in terms of definition; and (iii) fast clustering through k-means partitioning, in terms of quality of results. For the latter, we carry out an in depth astronomical study. We apply the Baire distance to spectrometric and photometric redshifts from the Sloan Digital Sky Survey using, in this work, about half a million astronomical objects. We want to know how well the (more costly to determine) spectrometric redshifts can predict the (more easily obtained) photometric redshifts, i.e. we seek to regress the spectrometric on the photometric redshifts, and we use clusterwise regression for this.
引用
收藏
页码:118 / 143
页数:26
相关论文
共 50 条
  • [21] Fast tree aggregation for consensus hierarchical clustering
    Hulot, Audrey
    Chiquet, Julien
    Jaffrezic, Florence
    Rigaill, Guillem
    BMC BIOINFORMATICS, 2020, 21 (01)
  • [22] Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing
    Koga, Hisashi
    Ishibashi, Tetsuo
    Watanabe, Toshinori
    KNOWLEDGE AND INFORMATION SYSTEMS, 2007, 12 (01) : 25 - 53
  • [23] Fast tree aggregation for consensus hierarchical clustering
    Audrey Hulot
    Julien Chiquet
    Florence Jaffrézic
    Guillem Rigaill
    BMC Bioinformatics, 21
  • [24] Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing
    Hisashi Koga
    Tetsuo Ishibashi
    Toshinori Watanabe
    Knowledge and Information Systems, 2007, 12 : 25 - 53
  • [25] A PWA model identification method for nonlinear systems using hierarchical clustering based on the gap metric
    Wang, Jiaorao
    Song, Chunyue
    Zhao, Jun
    Xu, Zuhua
    COMPUTERS & CHEMICAL ENGINEERING, 2020, 138 (138)
  • [26] Non-linear correlation analysis in financial markets using hierarchical clustering
    Salgado-Hernandez, J. E.
    Vyas, Manan
    JOURNAL OF PHYSICS COMMUNICATIONS, 2023, 7 (05):
  • [27] Evolutionary hierarchical time series clustering
    Chis, Monica
    Grosan, Crina
    ISDA 2006: SIXTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, VOL 1, 2006, : 451 - 455
  • [28] Fast R Functions for Robust Correlations and Hierarchical Clustering
    Langfelder, Peter
    Horvath, Steve
    JOURNAL OF STATISTICAL SOFTWARE, 2012, 46 (11): : 1 - 17
  • [29] An Effective Performance of Fuzzy Hierarchical Clustering Using Time Series Data Streams
    Kavitha, V.
    Punithavalli, M.
    COMPUTER NETWORKS AND INFORMATION TECHNOLOGIES, 2011, 142 : 242 - +
  • [30] Mean residence time by hierarchical clustering analysis
    L. Guzzardi
    D. F. Cazar
    C. V. del Hierro
    F. J. Torres
    M. A. Méndez
    Theoretical Chemistry Accounts, 2016, 135