On clustering tree structured data with categorical nature

被引:13
|
作者
Boutsinas, B. [1 ,2 ]
Papastergiou, T. [2 ]
机构
[1] Univ Patras, Dept Business Adm, GR-26500 Rion, Greece
[2] Univ Patras, Artificial Intelligence Res Ctr, GR-26500 Rion, Greece
关键词
clustering; (dis)similarity measures; data mining;
D O I
10.1016/j.patcog.2008.05.023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering consists in partitioning a set of objects into disjoint and homogeneous clusters. For many years, clustering methods have been applied in a wide variety of disciplines and they also have been utilized in many scientific areas. Traditionally, clustering methods deal with numerical data, i.e. objects represented by a conjunction of numerical attribute Values. However, nowadays commercial or scientific databases Usually contain categorical data, i.e. objects represented by categorical attributes. In this paper we present a dissimilarity measure which is capable to deal with tree structured categorical data. Thus, it can be used for extending the various versions of the very popular k-means clustering algorithm to deal with Such data. We discuss how such an extension can be achieved. Moreover, we empirically prove that the proposed dissimilarity measure is accurate, compared to other well-known (dis)similarity measures for categorical data. (C) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:3613 / 3623
页数:11
相关论文
共 50 条
  • [21] A SCALABLE CLUSTERING METHOD FOR CATEGORICAL SEQUENCE DATA
    Oh, Seung-Joon
    Kim, Jae-Yearn
    INTERNATIONAL JOURNAL OF COMPUTATIONAL METHODS, 2005, 2 (02) : 167 - 180
  • [22] Kernel Subspace Clustering Algorithm for Categorical Data
    Xu K.-P.
    Chen L.-F.
    Sun H.-J.
    Wang B.-Z.
    Ruan Jian Xue Bao/Journal of Software, 2020, 31 (11): : 3492 - 3505
  • [23] The performance of objective functions for clustering categorical data
    Xiang, Zhengrong
    Islam, Md Zahidul
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, 8863 : 16 - 28
  • [24] Generalized Similarity Measure for Categorical Data Clustering
    Sharma, Shruti
    Singh, Manoj
    2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 765 - 769
  • [25] EnsCat: clustering of categorical data via ensembling
    Clarke, Bertrand S.
    Amiri, Saeid
    Clarke, Jennifer L.
    BMC BIOINFORMATICS, 2016, 17
  • [26] A structured family of clustering and tree construction methods
    Bryant, D
    ADVANCES IN APPLIED MATHEMATICS, 2001, 27 (04) : 705 - 732
  • [27] A new initialization method for clustering categorical data
    Wu, Shu
    Jiang, Qingshan
    Huang, Joshua Zhexue
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2007, 4426 : 972 - +
  • [28] Clustering Categorical Data Using Hierarchies (CLUCDUH)
    Silahtaroglu, Gökhan
    World Academy of Science, Engineering and Technology, 2009, 56 : 334 - 339
  • [29] DETECTIVE: A decision tree based categorical value clustering and perturbation technique for preserving privacy in data mining
    Islam, MZ
    Brankovic, L
    2005 3rd IEEE International Conference on Industrial Informatics (INDIN), 2005, : 701 - 708
  • [30] Clustering categorical data based on distance vectors
    Zhang, P
    Wang, XG
    Song, PXK
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2006, 101 (473) : 355 - 367