An expressive dissimilarity measure for relational clustering using neighbourhood trees

被引:4
|
作者
Dumancic, Sebastijan [1 ]
Blockeel, Hendrik [1 ]
机构
[1] Katholieke Univ Leuven, Dept Comp Sci, Celestijnenlaan 200A, Heverlee, Belgium
关键词
Relational learning; Clustering; Similarity of structured objects; CLASSIFICATION; AGGREGATION; KERNEL;
D O I
10.1007/s10994-017-5644-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is an underspecified task: there are no universal criteria for what makes a good clustering. This is especially true for relational data, where similarity can be based on the features of individuals, the relationships between them, or a mix of both. Existing methods for relational clustering have strong and often implicit biases in this respect. In this paper, we introduce a novel dissimilarity measure for relational data. It is the first approach to incorporate a wide variety of types of similarity, including similarity of attributes, similarity of relational context, and proximity in a hypergraph. We experimentally evaluate the proposed dissimilarity measure on both clustering and classification tasks using data sets of very different types. Considering the quality of the obtained clustering, the experiments demonstrate that (a) using this dissimilarity in standard clustering methods consistently gives good results, whereas other measures work well only on data sets that match their bias; and (b) on most data sets, the novel dissimilarity outperforms even the best among the existing ones. On the classification tasks, the proposed method outperforms the competitors on the majority of data sets, often by a large margin. Moreover, we show that learning the appropriate bias in an unsupervised way is a very challenging task, and that the existing methods offer a marginal gain compared to the proposed similarity method, and can even hurt performance. Finally, we show that the asymptotic complexity of the proposed dissimilarity measure is similar to the existing state-of-the-art approaches. The results confirm that the proposed dissimilarity measure is indeed versatile enough to capture relevant information, regardless of whether that comes from the attributes of vertices, their proximity, or connectedness of vertices, even without parameter tuning.
引用
收藏
页码:1523 / 1545
页数:23
相关论文
共 50 条
  • [1] An expressive dissimilarity measure for relational clustering using neighbourhood trees
    Sebastijan Dumančić
    Hendrik Blockeel
    Machine Learning, 2017, 106 : 1523 - 1545
  • [2] Phylogenetic trees dissimilarity measure based on strict frequent splits set and its application for clustering
    Koperwas, Jakub
    Walczak, Krzysztof
    ROUGH SETS AND KNOWLEDGE TECHNOLOGY, 2008, 5009 : 604 - 611
  • [3] Graph Enhanced Fuzzy Clustering for Categorical Data Using a Bayesian Dissimilarity Measure
    Zhang, Chuanbin
    Chen, Long
    Zhao, Yin-Ping
    Wang, Yingxu
    Chen, C. L. Philip
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2023, 31 (03) : 810 - 824
  • [4] Feature selection using feature dissimilarity measure and density-based clustering: Application to biological data
    Sengupta, Debarka
    Aich, Indranil
    Bandyopadhyay, Sanghamitra
    JOURNAL OF BIOSCIENCES, 2015, 40 (04) : 721 - 730
  • [5] Dynamic Dissimilarity Measure for Support-Based Clustering
    Lee, Daewon
    Lee, Jaewook
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2010, 22 (06) : 900 - 905
  • [6] A tail dependence-based dissimilarity measure for financial time series clustering
    De Luca, Giovanni
    Zuccolotto, Paola
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2011, 5 (04) : 323 - 340
  • [7] Hierarchical Clustering with Simple Matching and Joint Entropy Dissimilarity Measure
    Cilingirturk, A. Mete
    Ergut, Ozlem
    JOURNAL OF MODERN APPLIED STATISTICAL METHODS, 2014, 13 (01) : 329 - 338
  • [8] On the impact of dissimilarity measure in k-modes clustering algorithm
    Ng, Michael K.
    Li, Mark Junjie
    Huang, Joshua Zhexue
    He, Zengyou
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2007, 29 (03) : 503 - 507
  • [9] Clustering with missing features: a penalized dissimilarity measure based approach
    Datta, Shounak
    Bhattacharjee, Supritam
    Das, Swagatam
    MACHINE LEARNING, 2018, 107 (12) : 1987 - 2025
  • [10] Clustering using PK-D: A connectivity and density dissimilarity
    Baya, Ariel E.
    Larese, Monica G.
    Granitto, Pablo M.
    EXPERT SYSTEMS WITH APPLICATIONS, 2016, 51 : 151 - 160