Fast, Linear Time Hierarchical Clustering using the Baire Metric

被引:12
作者
Contreras, Pedro [1 ,2 ]
Murtagh, Fionn [3 ,4 ]
机构
[1] Univ London, Dept Comp Sci, Egham TW20 0EX, Surrey, England
[2] ThinkingSafe Ltd, Egham, Surrey, England
[3] Univ London, Dublin, Ireland
[4] Sci Fdn Ireland, Dublin, Ireland
关键词
Hierarchical clustering; Ultrametric; Redshift; k-means; p-adic; m-adic; Baire; Longest common prefix; ULTRAMETRICITY; DENDROGRAMS; COMPUTATION;
D O I
10.1007/s00357-012-9106-3
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
The Baire metric induces an ultrametric on a dataset and is of linear computational complexity, contrasted with the standard quadratic time agglomerative hierarchical clustering algorithm. In this work we evaluate empirically this new approach to hierarchical clustering. We compare hierarchical clustering based on the Baire metric with (i) agglomerative hierarchical clustering, in terms of algorithm properties; (ii) generalized ultrametrics, in terms of definition; and (iii) fast clustering through k-means partitioning, in terms of quality of results. For the latter, we carry out an in depth astronomical study. We apply the Baire distance to spectrometric and photometric redshifts from the Sloan Digital Sky Survey using, in this work, about half a million astronomical objects. We want to know how well the (more costly to determine) spectrometric redshifts can predict the (more easily obtained) photometric redshifts, i.e. we seek to regress the spectrometric on the photometric redshifts, and we use clusterwise regression for this.
引用
收藏
页码:118 / 143
页数:26
相关论文
共 50 条
  • [31] Mean residence time by hierarchical clustering analysis
    Guzzardi, L.
    Cazar, D. F.
    del Hierro, C. V.
    Torres, F. J.
    Mendez, M. A.
    THEORETICAL CHEMISTRY ACCOUNTS, 2016, 135 (04)
  • [32] Graph Anonymization Using Hierarchical Clustering
    Mohapatra, Debasis
    Patra, Manas Ranjan
    COMPUTATIONAL INTELLIGENCE IN DATA MINING, 2019, 711 : 145 - 154
  • [33] Ear Localization using Hierarchical Clustering
    Prakash, Surya
    Jayaraman, Umarani
    Gupta, Phalguni
    OPTICS AND PHOTONICS IN GLOBAL HOMELAND SECURITY V AND BIOMETRIC TECHNOLOGY FOR HUMAN IDENTIFICATION VI, 2009, 7306
  • [34] Semantic Clustering of Functional Requirements Using Agglomerative Hierarchical Clustering
    Salman, Hamzeh Eyal
    Hammad, Mustafa
    Seriai, Abdelhak-Djamel
    Al-Sbou, Ahed
    INFORMATION, 2018, 9 (09)
  • [35] Data clustering and analyzing techniques using hierarchical clustering method
    Hu, Wen
    Pan, Qing He
    MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (19) : 8495 - 8504
  • [36] Data clustering and analyzing techniques using hierarchical clustering method
    Wen Hu
    Qing he Pan
    Multimedia Tools and Applications, 2015, 74 : 8495 - 8504
  • [37] Hierarchical clustering and the construction of (optimal) ultrametrics using Lp-norms
    Hubert, L
    Arabie, P
    Meulman, J
    L(1)-STATISTICAL PROCEDURES AND RELATED TOPICS, 1997, 31 : 457 - 472
  • [38] Constraint-based Hierarchical Clustering for Time Sequences
    Kou, Yufeng
    Knackstedt, Chris
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 2705 - 2711
  • [39] Hierarchical Agglomerative Clustering of Time-Warped Series
    Kotas, Marian
    Leski, Jacek
    Moron, Tomasz
    Guzman, Jader Giraldo
    MAN-MACHINE INTERACTIONS 5, ICMMI 2017, 2018, 659 : 207 - 216
  • [40] A Fast Quad-Tree Based Two Dimensional Hierarchical Clustering
    Rajadurai, Priscilla
    Sankaranarayanan, Swamynathan
    BIOINFORMATICS AND BIOLOGY INSIGHTS, 2012, 6 : 265 - 274