An expressive dissimilarity measure for relational clustering using neighbourhood trees

被引：4

作者：

Dumancic, Sebastijan ^{[1
]}

Blockeel, Hendrik ^{[1
]}

机构：

[1] Katholieke Univ Leuven, Dept Comp Sci, Celestijnenlaan 200A, Heverlee, Belgium

来源：

MACHINE LEARNING | 2017年 / 106卷 / 9-10期

关键词：

Relational learning; Clustering; Similarity of structured objects; CLASSIFICATION; AGGREGATION; KERNEL;

D O I：

10.1007/s10994-017-5644-6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Clustering is an underspecified task: there are no universal criteria for what makes a good clustering. This is especially true for relational data, where similarity can be based on the features of individuals, the relationships between them, or a mix of both. Existing methods for relational clustering have strong and often implicit biases in this respect. In this paper, we introduce a novel dissimilarity measure for relational data. It is the first approach to incorporate a wide variety of types of similarity, including similarity of attributes, similarity of relational context, and proximity in a hypergraph. We experimentally evaluate the proposed dissimilarity measure on both clustering and classification tasks using data sets of very different types. Considering the quality of the obtained clustering, the experiments demonstrate that (a) using this dissimilarity in standard clustering methods consistently gives good results, whereas other measures work well only on data sets that match their bias; and (b) on most data sets, the novel dissimilarity outperforms even the best among the existing ones. On the classification tasks, the proposed method outperforms the competitors on the majority of data sets, often by a large margin. Moreover, we show that learning the appropriate bias in an unsupervised way is a very challenging task, and that the existing methods offer a marginal gain compared to the proposed similarity method, and can even hurt performance. Finally, we show that the asymptotic complexity of the proposed dissimilarity measure is similar to the existing state-of-the-art approaches. The results confirm that the proposed dissimilarity measure is indeed versatile enough to capture relevant information, regardless of whether that comes from the attributes of vertices, their proximity, or connectedness of vertices, even without parameter tuning.

引用

页码：1523 / 1545

页数：23

共 50 条

[31] Clustering and outlier detection using isoperimetric number of trees [J].

Daneshgar, A. ;

Javadi, R. ;

Razavi, S. B. Shariat .

PATTERN RECOGNITION, 2013, 46 (12) :3371-3382

[32] SEMANTIC OBJECT RECOGNITION USING CLUSTERING AND DECISION TREES [J].

Schmidsberger, Falk ;

Stolzenburg, Frieder .

ICAART 2011: PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 1, 2011, :670-673

[33] Improving Accuracy of Decision Trees Using Clustering Techniques [J].

Torres-Nino, Javier ;

Rodriguez-Gonzalez, Alejandro ;

Colomo-Palacios, Ricardo ;

Jimenez-Domingo, Enrique ;

Alor-Hernandez, Giner .

JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2013, 19 (04) :484-501

[34] Clustering Protein Sequences Using Affinity Propagation Based on an Improved Similarity Measure [J].

Yang, Fan ;

Zhu, Qing-Xin ;

Tang, Dong-Ming ;

Zhao, Ming-Yuan .

EVOLUTIONARY BIOINFORMATICS, 2009, 5 :137-146

[35] An efficient feature selection approach for clustering: Using a Gaussian mixture model of data dissimilarity [J].

Tsai, Chieh-Yuan ;

Chiu, Chuang-Cheng .

COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2007, PT 1, PROCEEDINGS, 2007, 4705 :1107-1118

[36] Estimating cluster validity using compactness measure and overlap measure for fuzzy clustering [J].

Rani B. ;

Kant S. .

International Journal of Business Intelligence and Data Mining, 2022, 20 (03) :345-363

[37] A deep embedded clustering technique using dip test and unique neighbourhood set [J].

Rahman, Md Anisur ;

Ang, Li-Minn ;

Sun, Yuan ;

Seng, Kah Phooi .

Neural Computing and Applications, 2025, 37 (03) :1345-1356

[38] Using decision trees to learn ontology taxonomies from relational databases [J].

Sbai, Sara ;

Chabih, Oussama ;

Louhdi, Mohammed Reda Chbihi ;

Behja, Hicham ;

Zemmouri, El Moukhtar ;

Trousse, Brigitte .

2020 6TH IEEE CONGRESS ON INFORMATION SCIENCE AND TECHNOLOGY (IEEE CIST'20), 2020, :54-58

[39] SEND: A novel dissimilarity metric using ensemble properties of the feature space for clustering numerical data [J].

Mishra, Gaurav ;

Kar, Amit Kumar ;

Mishra, Amaresh Chandra ;

Mohanty, Sraban Kumar ;

Panda, M. K. .

INFORMATION SCIENCES, 2021, 574 :279-296

[40] Using Force-Based Graph Layout for Clustering of Relational Data [J].

Zabiniako, Vitaly .

ADVANCES IN DATABASES AND INFORMATION SYSTEMS, 2010, 5968 :193-201

← 1 2 3 4 5 →