On clustering tree structured data with categorical nature

被引:13
|
作者
Boutsinas, B. [1 ,2 ]
Papastergiou, T. [2 ]
机构
[1] Univ Patras, Dept Business Adm, GR-26500 Rion, Greece
[2] Univ Patras, Artificial Intelligence Res Ctr, GR-26500 Rion, Greece
关键词
clustering; (dis)similarity measures; data mining;
D O I
10.1016/j.patcog.2008.05.023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering consists in partitioning a set of objects into disjoint and homogeneous clusters. For many years, clustering methods have been applied in a wide variety of disciplines and they also have been utilized in many scientific areas. Traditionally, clustering methods deal with numerical data, i.e. objects represented by a conjunction of numerical attribute Values. However, nowadays commercial or scientific databases Usually contain categorical data, i.e. objects represented by categorical attributes. In this paper we present a dissimilarity measure which is capable to deal with tree structured categorical data. Thus, it can be used for extending the various versions of the very popular k-means clustering algorithm to deal with Such data. We discuss how such an extension can be achieved. Moreover, we empirically prove that the proposed dissimilarity measure is accurate, compared to other well-known (dis)similarity measures for categorical data. (C) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:3613 / 3623
页数:11
相关论文
共 50 条
  • [31] Clustering and variable selection for categorical multivariate data
    Bontemps, Dominique
    Toussile, Wilson
    ELECTRONIC JOURNAL OF STATISTICS, 2013, 7 : 2344 - 2371
  • [32] EnsCat: clustering of categorical data via ensembling
    Bertrand S. Clarke
    Saeid Amiri
    Jennifer L. Clarke
    BMC Bioinformatics, 17
  • [33] Improved Clustering for Categorical Data with Genetic Algorithm
    Sharma, Abha
    Thakur, R. S.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MICROELECTRONICS, COMPUTING & COMMUNICATION SYSTEMS, MCCS 2015, 2018, 453 : 67 - 76
  • [34] The Performance of Objective Functions for Clustering Categorical Data
    Xiang, Zhengrong
    Islam, Md Zahidul
    KNOWLEDGE MANAGEMENT AND ACQUISITION FOR SMART SYSTEMS AND SERVICES, PKAW 2014, 2014, 8863 : 16 - 28
  • [35] Rough Set Approach for Categorical Data Clustering
    Herawan, Tutut
    Yanto, Iwan Tri Riyadi
    Deris, Mustafa Mat
    DATABASE THEORY AND APPLICATION, 2009, 64 : 179 - 186
  • [36] Weighted Delta Factor Cluster Ensemble Algorithm for Categorical Data Clustering in Data Mining
    Sengottaian, Sarumathi
    Natesan, Shanthi
    Mathivanan, Sharmila
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2017, 14 (03) : 275 - 284
  • [37] A Link-Based Cluster Ensemble Approach for Categorical Data Clustering
    Iam-On, Natthakan
    Boongoen, Tossapon
    Garrett, Simon
    Price, Chris
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (03) : 413 - 425
  • [38] ac Clustering categorical data using silhouette coefficient as a relocating measure
    Aranganayagi, S.
    Thangavel, K.
    ICCIMA 2007: INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND MULTIMEDIA APPLICATIONS, VOL II, PROCEEDINGS, 2007, : 13 - +
  • [39] Mining categorical sequences from data using a hybrid clustering method
    De Angelis, Luca
    Dias, Jose G.
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2014, 234 (03) : 720 - 730
  • [40] A Roughset Based Data Labeling Method for Clustering Categorical Data
    Reddy, H. Venkateswara
    Raju, S. Viswanadha
    2014 3RD INTERNATIONAL CONFERENCE ON ECO-FRIENDLY COMPUTING AND COMMUNICATION SYSTEMS (ICECCS 2014), 2014, : 51 - 55