An expressive dissimilarity measure for relational clustering using neighbourhood trees

被引:4
作者
Dumancic, Sebastijan [1 ]
Blockeel, Hendrik [1 ]
机构
[1] Katholieke Univ Leuven, Dept Comp Sci, Celestijnenlaan 200A, Heverlee, Belgium
关键词
Relational learning; Clustering; Similarity of structured objects; CLASSIFICATION; AGGREGATION; KERNEL;
D O I
10.1007/s10994-017-5644-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is an underspecified task: there are no universal criteria for what makes a good clustering. This is especially true for relational data, where similarity can be based on the features of individuals, the relationships between them, or a mix of both. Existing methods for relational clustering have strong and often implicit biases in this respect. In this paper, we introduce a novel dissimilarity measure for relational data. It is the first approach to incorporate a wide variety of types of similarity, including similarity of attributes, similarity of relational context, and proximity in a hypergraph. We experimentally evaluate the proposed dissimilarity measure on both clustering and classification tasks using data sets of very different types. Considering the quality of the obtained clustering, the experiments demonstrate that (a) using this dissimilarity in standard clustering methods consistently gives good results, whereas other measures work well only on data sets that match their bias; and (b) on most data sets, the novel dissimilarity outperforms even the best among the existing ones. On the classification tasks, the proposed method outperforms the competitors on the majority of data sets, often by a large margin. Moreover, we show that learning the appropriate bias in an unsupervised way is a very challenging task, and that the existing methods offer a marginal gain compared to the proposed similarity method, and can even hurt performance. Finally, we show that the asymptotic complexity of the proposed dissimilarity measure is similar to the existing state-of-the-art approaches. The results confirm that the proposed dissimilarity measure is indeed versatile enough to capture relevant information, regardless of whether that comes from the attributes of vertices, their proximity, or connectedness of vertices, even without parameter tuning.
引用
收藏
页码:1523 / 1545
页数:23
相关论文
共 50 条
  • [21] Speech recognition using randomized relational decision trees
    Amit, Y
    Murua, A
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (04): : 333 - 341
  • [22] Web Service Clustering Using Relational Database Approach
    Liu, Jianxiao
    Liu, Feng
    Li, Xiaoxia
    He, Keqing
    Ma, Yutao
    Wang, Jian
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2015, 25 (08) : 1365 - 1393
  • [23] Relational Gustafson Kessel clustering using medoids and triangulation
    Runkler, TA
    FUZZ-IEEE 2005: Proceedings of the IEEE International Conference on Fuzzy Systems: BIGGEST LITTLE CONFERENCE IN THE WORLD, 2005, : 73 - 78
  • [24] A novel clustering method for complex signals and feature extraction based on advanced information-based dissimilarity measure
    Shang, Du
    Shang, Pengjian
    Li, Ang
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
  • [25] Graph clustering using k-Neighbourhood Attribute Structural similarity
    Boobalan, M. Parimala
    Lopez, Daphne
    Gao, X. Z.
    APPLIED SOFT COMPUTING, 2016, 47 : 216 - 223
  • [26] Classification by clustering using an extended saliency measure
    Barak, A.
    Gelbard, R.
    EXPERT SYSTEMS, 2016, 33 (01) : 46 - 59
  • [27] Clustering Sentence-Level Text Using a Novel Fuzzy Relational Clustering Algorithm
    Skabar, Andrew
    Abdalgader, Khaled
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (01) : 62 - 75
  • [28] An Improved K-modes Clustering Algorithm Based on Intra-cluster and Inter-cluster Dissimilarity Measure
    Zhou, Hongfang
    Zhang, Yihui
    Liu, Yibin
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING, INFORMATION SCIENCE & APPLICATION TECHNOLOGY (ICCIA 2017), 2017, 74 : 410 - 418
  • [29] Multimodal Object Recognition Using Random Clustering Trees
    Villamizar, M.
    Garrell, A.
    Sanfeliu, A.
    Moreno-Noguer, F.
    PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2015), 2015, 9117 : 496 - 504
  • [30] Optimal Interpretable Clustering Using Oblique Decision Trees
    Gabidolla, Magzhan
    Carreira-Perpinan, Miguel A.
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 400 - 410