On clustering tree structured data with categorical nature

被引:13
|
作者
Boutsinas, B. [1 ,2 ]
Papastergiou, T. [2 ]
机构
[1] Univ Patras, Dept Business Adm, GR-26500 Rion, Greece
[2] Univ Patras, Artificial Intelligence Res Ctr, GR-26500 Rion, Greece
关键词
clustering; (dis)similarity measures; data mining;
D O I
10.1016/j.patcog.2008.05.023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering consists in partitioning a set of objects into disjoint and homogeneous clusters. For many years, clustering methods have been applied in a wide variety of disciplines and they also have been utilized in many scientific areas. Traditionally, clustering methods deal with numerical data, i.e. objects represented by a conjunction of numerical attribute Values. However, nowadays commercial or scientific databases Usually contain categorical data, i.e. objects represented by categorical attributes. In this paper we present a dissimilarity measure which is capable to deal with tree structured categorical data. Thus, it can be used for extending the various versions of the very popular k-means clustering algorithm to deal with Such data. We discuss how such an extension can be achieved. Moreover, we empirically prove that the proposed dissimilarity measure is accurate, compared to other well-known (dis)similarity measures for categorical data. (C) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:3613 / 3623
页数:11
相关论文
共 50 条
  • [41] Detecting outliers in categorical data through rough clustering
    Suri, N. N. R. Ranga
    Murty, M. Narasimha
    Athithan, G.
    NATURAL COMPUTING, 2016, 15 (03) : 385 - 394
  • [42] A Support Based Initialization Algorithm for Categorical Data Clustering
    Kumar, Ajay
    Kumar, Shishir
    JOURNAL OF INFORMATION TECHNOLOGY RESEARCH, 2018, 11 (02) : 53 - 67
  • [43] k-ANMI:: A mutual information based clustering algorithm for categorical data
    He, Zengyou
    Xu, Xiaofei
    Deng, Shengchun
    INFORMATION FUSION, 2008, 9 (02) : 223 - 233
  • [44] Multiobjective clustering algorithm with fuzzy centroids for categorical data
    Zhou Z.
    Zhu S.
    Zhang D.
    1600, Science Press (53): : 2594 - 2606
  • [45] An Integrated Clustering Approach for High Dimensional Categorical Data
    Kalaivani, K.
    Raghavendra, A. P. V.
    2013 IEEE INTERNATIONAL CONFERENCE ON GREEN HIGH PERFORMANCE COMPUTING (ICGHPC), 2013,
  • [46] Detecting outliers in categorical data through rough clustering
    N. N. R. Ranga Suri
    M. Narasimha Murty
    G. Athithan
    Natural Computing, 2016, 15 : 385 - 394
  • [47] Clustering High-Dimensional Noisy Categorical Data
    Tian, Zhiyi
    Xu, Jiaming
    Tang, Jen
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (548) : 3008 - 3019
  • [48] Performances of parallel clustering algorithm for categorical and mixed data
    Hai, NTM
    Susumu, H
    PARALLEL AND DISTRIBUTED COMPUTING: APPLICATIONS AND TECHNOLOGIES, PROCEEDINGS, 2004, 3320 : 252 - 256
  • [49] A fair-multicluster approach to clustering of categorical data
    Santos-Mangudo, Carlos
    Heras, Antonio J.
    CENTRAL EUROPEAN JOURNAL OF OPERATIONS RESEARCH, 2023, 31 (02) : 583 - 604
  • [50] Categorical Data Clustering with Automatic Selection of Cluster Number
    Liao, Hai-Yong
    Ng, Michael K.
    FUZZY INFORMATION AND ENGINEERING, 2009, 1 (01) : 5 - 25